# 🧠 Student Performance Data Analysis
This notebook analyzes the **student-mat.csv** dataset to explore student performance, study habits, and factors influencing grades.

## 🛠 Objectives
1. Load Dataset
2. Explore & Clean Data
3. Answer Key Questions
4. Create Visualizations
5. Document Findings

## Step 1: Import Required Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

print('✅ Libraries imported successfully!')

## Step 2: Load Dataset

In [None]:
df = pd.read_csv('student-mat.csv')
print('✅ Dataset Loaded Successfully!')
df.head()

## Step 3: Explore & Clean Data

In [None]:
print('Dataset Info:')
print(df.info())

print('\nDataset Shape:', df.shape)
print('\nMissing Values:\n', df.isnull().sum())
print('\nDuplicated Rows:', df.duplicated().sum())

# Remove duplicates
df = df.drop_duplicates()
print('✅ Duplicates removed. New shape:', df.shape)

## Step 4: Descriptive Statistics

In [None]:
df.describe()

## Step 5: Analysis Questions

In [None]:
# 1️⃣ Average final grade (G3)
avg_grade = df['G3'].mean()
print(f'Average Final Grade (G3): {avg_grade:.2f}')

# 2️⃣ How many students scored above 15?
above_15 = df[df['G3'] > 15].shape[0]
print(f'Students scoring above 15: {above_15}')

# 3️⃣ Is study time correlated with performance?
corr = df['studytime'].corr(df['G3'])
print(f'Correlation between Study Time and Final Grade (G3): {corr:.2f}')

# 4️⃣ Which gender performs better on average?
gender_perf = df.groupby('sex')['G3'].mean()
print('\nAverage Grade by Gender:\n', gender_perf)

## Step 6: Visualizations

In [None]:
plt.figure(figsize=(14, 10))

# Histogram of grades
plt.subplot(2, 2, 1)
plt.hist(df['G3'], bins=10, color='skyblue', edgecolor='black')
plt.title('Distribution of Final Grades (G3)')
plt.xlabel('Final Grade (G3)')
plt.ylabel('Count')

# Scatterplot: Study Time vs Grades
plt.subplot(2, 2, 2)
sns.scatterplot(x='studytime', y='G3', data=df, color='green')
plt.title('Study Time vs Final Grade')
plt.xlabel('Study Time')
plt.ylabel('Final Grade')

# Bar chart: Male vs Female Average Score
plt.subplot(2, 2, 3)
sns.barplot(x='sex', y='G3', data=df, estimator='mean', ci=None, palette='coolwarm')
plt.title('Average Final Grade by Gender')
plt.xlabel('Gender')
plt.ylabel('Average Final Grade')

plt.tight_layout()
plt.show()

## Step 7: Insights
- Average final grade is around **(computed value)**
- Number of students scoring above 15: **(computed value)**
- Study time has a **weak/moderate/strong** correlation with performance
- On average, **Male/Female** students perform slightly better

✅ *End of Analysis*