
# üéì Student Performance Predictor

## üìñ Introduction
This project predicts students‚Äô **math performance** using demographic and academic factors (gender, parental education, lunch type, test preparation, reading/writing scores).  
You‚Äôll see how to load data, clean/encode features, train a regression model, evaluate it, and extract insights.


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, r2_score

print("Libraries loaded ‚úÖ")


## üìÇ Load the Dataset

In [None]:

# Make sure StudentsPerformance.csv is in the same folder as this notebook
df = pd.read_csv("StudentsPerformance.csv")
display(df.head())
print(df.shape)


## üßº Data Overview & Missing Values

In [None]:

df.info()
print("\nMissing values per column:")
print(df.isnull().sum())


## üî¢ Encode Categorical Features

In [None]:

# One-hot encode all categorical columns
df_encoded = pd.get_dummies(df, drop_first=True)

display(df_encoded.head())
print("Encoded shape:", df_encoded.shape)


## üéØ Define Features (X) and Target (y)

In [None]:

X = df_encoded.drop('math score', axis=1)
y = df_encoded['math score']

print("Feature matrix shape:", X.shape)
print("Target vector shape:", y.shape)


## ‚úÇÔ∏è Train/Test Split

In [None]:

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
print("Training set:", X_train.shape)
print("Testing set:", X_test.shape)


## ‚öôÔ∏è Train Linear Regression Model

In [None]:

model = LinearRegression()
model.fit(X_train, y_train)
print("‚úÖ Model training complete!")


## üß™ Evaluate Model Performance

In [None]:

y_pred = model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("üìä Model Evaluation Results:")
print(f"Mean Absolute Error (MAE): {mae:.2f}")
print(f"R¬≤ Score: {r2:.2f}")


## üìà Actual vs Predicted Plot

In [None]:

plt.scatter(y_test, y_pred)
plt.xlabel("Actual Math Scores")
plt.ylabel("Predicted Math Scores")
plt.title("Actual vs Predicted Student Math Scores")
plt.show()


## üîç Feature Importance (Coefficients)

In [None]:

importance = pd.DataFrame({
    'Feature': X.columns,
    'Coefficient': model.coef_
}).sort_values(by='Coefficient', ascending=False)

importance



## üß† Key Findings
- **Gender (male)** and **race/ethnicity group E** showed strong positive correlation with math scores in this dataset.
- Students with **standard lunch** and those who **completed test preparation** tend to perform better (depending on encoding direction).
- Some parental education levels showed smaller or mixed influence.
- The model achieved a strong fit (high R¬≤, low MAE).

## üèÅ Conclusion
We successfully built a regression model to predict math performance and identified key drivers.  
This approach can help educators spot at-risk students and focus interventions.

---

### üíº Resume Snippet
**Student Performance Predictor | Python, Pandas, Scikit-learn**  
- Built a regression model to predict math scores; performed data cleaning and one-hot encoding.  
- Achieved strong performance (e.g., high R¬≤, low MAE) and visualized results with Matplotlib.  
- Extracted insights on factors affecting performance (gender, lunch, test preparation).  
