# ðŸŽ“ Graduate Admission Predictor

This notebook explores how academic profile features like GRE, TOEFL, and CGPA can be used to predict a student's chance of admission to graduate school using machine learning.

## ðŸ“Š Dataset Overview

We use the Kaggle Graduate Admissions dataset, which includes scores, ratings, and research experience. We'll load and clean it, then perform EDA.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import kagglehub
from kagglehub import KaggleDatasetAdapter

# Load dataset
df = kagglehub.load_dataset(
    KaggleDatasetAdapter.PANDAS,
    "mohansacharya/graduate-admissions",
    "Admission_Predict_Ver1.1.csv"
)
df = df.rename(columns={'Chance of Admit ': 'Chance of Admit'})
df.drop('Serial No.', axis=1, inplace=True)
df.head()

In [None]:
corr = df.corr()
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("white"):
    f, ax = plt.subplots(figsize=(9, 7))
    sns.heatmap(corr, mask=mask, square=True, annot=True, fmt='0.2f', linewidths=.8, cmap="coolwarm")

In [None]:
sns.regplot(x="GRE Score", y="TOEFL Score", data=df)
plt.title("GRE Score vs TOEFL Score")
plt.show()

sns.regplot(x="GRE Score", y="CGPA", data=df)
plt.title("GRE Score vs CGPA")
plt.show()

In [None]:
for col in ['GRE Score', 'TOEFL Score', 'University Rating', 'SOP', 'CGPA']:
    sns.histplot(df[col], kde=False)
    plt.title(f"Distribution of {col}")
    plt.show()

## ðŸ¤– Model Training and Evaluation

We'll normalize features, split the data, train several models, and compare performance using RMSE, MAE, and RÂ².

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import normalize
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, AdaBoostRegressor, ExtraTreesRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

X = df.drop('Chance of Admit', axis=1)
y = df['Chance of Admit']
X_norm = normalize(X)
X_train, X_test, y_train, y_test = train_test_split(X_norm, y, test_size=0.2, random_state=42)

regressors = [
    ("Linear Regression", LinearRegression()),
    ("Decision Tree", DecisionTreeRegressor()),
    ("Random Forest", RandomForestRegressor()),
    ("Gradient Boosting", GradientBoostingRegressor()),
    ("Ada Boosting", AdaBoostRegressor()),
    ("Extra Trees", ExtraTreesRegressor()),
    ("K-Neighbors", KNeighborsRegressor()),
    ("Support Vector", SVR())
]

results = []
for name, model in regressors:
    model.fit(X_train, y_train)
    preds = model.predict(X_test)
    rmse = np.sqrt(mean_squared_error(y_test, preds))
    mae = mean_absolute_error(y_test, preds)
    r2 = r2_score(y_test, preds)
    results.append((name, rmse, mae, r2))
    print(f"{name}: RMSE={rmse:.4f}, MAE={mae:.4f}, RÂ²={r2:.4f}")

## ðŸŽ¯ Feature Importance

We analyze feature contributions using Extra Trees and Linear Regression.

In [None]:
model = ExtraTreesRegressor().fit(X_norm, y)
importances = model.feature_importances_
sns.barplot(x=importances, y=X.columns)
plt.title("Feature Importances (Extra Trees)")
plt.show()

lr = LinearRegression().fit(X_norm, y)
coef_importance = np.abs(lr.coef_)
sns.barplot(x=coef_importance, y=X.columns)
plt.title("Feature Importances (Linear Regression)")
plt.show()

## ðŸ§ª Try Your Own Input

Test the model with a sample input.

In [None]:
final_model = ExtraTreesRegressor(n_estimators=200, max_depth=20, min_samples_split=2, min_samples_leaf=2, random_state=42)
final_model.fit(X_train, y_train)

sample = np.array([[320, 110, 4, 4, 4, 9.2, 1]])
sample = normalize(sample)
final_model.predict(sample)

## âœ… Conclusion

Extra Trees Regressor outperformed other models. CGPA, GRE, and TOEFL were the most influential features. This notebook demonstrates EDA, model evaluation, feature importance, and practical prediction workflows in Python.