# <center>Graduate Admission Chance Prediction 🎓</center>

<center><img width="800px" src="https://images.unsplash.com/photo-1607013407627-6ee814329547?ixlib=rb-1.2.1&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=664&q=80"></center>

# About the Dataset

* **GRE Scores** ( out of 340 ) - *Input Variable*
* **TOEFL Scores** ( out of 120 ) - *Input Variable*
* **University Rating** ( out of 5 ) - *Input Variable*
* **Statement of Purpose Strength** ( out of 5 ) - *Input Variable*
* **Letter of Recommendation Strength** ( out of 5 ) - *Input Variable*
* **Undergraduate GPA** ( out of 10 ) - *Input Variable*
* **Research Experience** ( either 0 or 1 ) - *Input Variable*
* **Chance of Admit** ( ranging from 0 to 1 ) - *Output Variable*

# Importing the Essential Libraries, Metrics, Tools and Models

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet
from sklearn.preprocessing import PolynomialFeatures
from lightgbm import LGBMRegressor

# Loading the Data

In [1]:
df = pd.read_csv("../input/graduate-admissions/Admission_Predict_Ver1.1.csv")

# Exploratory Data Analysis

***Taking a look at the first 5 rows of the dataset.***

In [1]:
df.head()

***Checking the shape—i.e. size—of the data.***

In [1]:
df.shape

***Learning the dtypes of columns' and how many non-null values there are in those columns.***

In [1]:
df.info()

***Getting the statistical summary of dataset.***

In [1]:
df.describe().T

# Checking for Missing Values and Duplicates

In [1]:
df.isna().sum()

In [1]:
df.duplicated().sum()

***There is no missing value and duplicate, data seems clean so far.***

<h2>Putting the Data into More Proper Form</h2>

***Dropping redundant columns***

In [1]:
df.drop("Serial No.", axis=1, inplace=True)

***Renaming columns (removing whitespace around "LOR" and "Chance of Admit" columns)***

In [1]:
df.rename(columns={"LOR ": "LOR", "Chance of Admit ": "Chance of Admit"}, inplace=True)

# Data Visualization

***Visualizing the Correlation between the numerical variables using pairplot visualization.***

In [1]:
sns.set_theme()

sns.pairplot(df)

<h2>Distribution of Each Variable</h2>

In [1]:
for col in df.columns:
    plt.figure(figsize=(10,8))
    sns.distplot(df[col])
    plt.title(f"{col}", size=15)
    plt.show()

<h2>Relationship Between Each Variable and Target Variable (Chance of Admit)</h2>

In [1]:
sns.set(style="whitegrid")

num_cols = df.drop(["University Rating", "Research", "Chance of Admit"], axis=1).columns
cat_cols = df[["University Rating", "Research"]].columns

for col in num_cols:
    plt.figure(figsize=(10,8))
    sns.jointplot(x=df[col], y=df["Chance of Admit"], kind="kde", cmap="Blues", fill=True)
    plt.show()

for col in cat_cols:
    plt.figure(figsize=(10,8))
    sns.barplot(x=df[col], y=df["Chance of Admit"])
    plt.show()

***Visualizing the linear correlations between variables using Heatmap visualization. The measure used for finding the linear correlation between each variable is Pearson Correlation Coefficient.***

In [1]:
plt.figure(figsize=(10,8))
sns.heatmap(df.corr(), annot=True, cmap="Blues")
plt.title("Correlations Between Features", size=16)
plt.show()

# X, y Split

In [1]:
X = df.drop("Chance of Admit", axis=1)
y = df["Chance of Admit"]

# Data Standardization

***Standardizing the numerical columns in X dataset. StandardScaler() adjusts the mean of the features as 0 and standard deviation of features as 1. Formula that StandardScaler() uses is as follows:***

<center><img width="250px" src="https://www.thoughtco.com/thmb/gItmqGd5HlnhyPIiLm1YHXOlTnw=/330x242/filters:fill(auto,1)/zscore-56a8fa785f9b58b7d0f6e87b.GIF"></center>

In [1]:
scaler = StandardScaler()
X[num_cols] = scaler.fit_transform(X[num_cols])

# Train-Test Split

***Splitting the data into Train and Test chunks for better evaluation.***

In [1]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

***Defining several evaluation functions for convenience.***

In [1]:
def evaluate(y_test, predictions):
    mae = mean_absolute_error(y_test, predictions)
    mse = mean_squared_error(y_test, predictions)
    r2 = r2_score(y_test, predictions)
    return mae, mse, r2

def rmse_cv(model):
    rmse = np.sqrt(-cross_val_score(model, X, y, scoring="neg_mean_squared_error", cv=5)).mean()
    return rmse

# Machine Learning Models

In [1]:
models = pd.DataFrame(columns=["Model", "MAE", "MSE", "R2 Score", "RMSE (Cross-Validated)"])

<h3>Linear Regression</h3>

In [1]:
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
predictions = lin_reg.predict(X_test)

mae, mse, r2 = evaluate(y_test, predictions)
rmse = rmse_cv(lin_reg)
print("MAE:", mae)
print("MSE:", mse)
print("R2 Score:", r2)
print("RMSE (Cross-Validated)", rmse)

new_row = {"Model": "LinearRegression", "MAE": mae, "MSE": mse, "R2 Score": r2, "RMSE (Cross-Validated)": rmse}
models = models.append(new_row, ignore_index=True)

<h3>Lasso (L1 Regularization)</h3>

In [1]:
lasso = Lasso(random_state=42)
lasso.fit(X_train, y_train)
predictions = lasso.predict(X_test)

mae, mse, r2 = evaluate(y_test, predictions)
rmse = rmse_cv(lasso)
print("MAE:", mae)
print("MSE:", mse)
print("R2 Score:", r2)
print("RMSE (Cross-Validated)", rmse)

new_row = {"Model": "Lasso", "MAE": mae, "MSE": mse, "R2 Score": r2, "RMSE (Cross-Validated)": rmse}
models = models.append(new_row, ignore_index=True)

<h3>Ridge (L2 Regularization)</h3>

In [1]:
ridge = Ridge(random_state=42)
ridge.fit(X_train, y_train)
predictions = ridge.predict(X_test)

mae, mse, r2 = evaluate(y_test, predictions)
rmse = rmse_cv(ridge)
print("MAE:", mae)
print("MSE:", mse)
print("R2 Score:", r2)
print("RMSE (Cross-Validated)", rmse)

new_row = {"Model": "Ridge", "MAE": mae, "MSE": mse, "R2 Score": r2, "RMSE (Cross-Validated)": rmse}
models = models.append(new_row, ignore_index=True)

<h3>Elastic Net</h3>

In [1]:
elastic_net = ElasticNet(random_state=42)
elastic_net.fit(X_train, y_train)
predictions = elastic_net.predict(X_test)

mae, mse, r2 = evaluate(y_test, predictions)
rmse = rmse_cv(elastic_net)
print("MAE:", mae)
print("MSE:", mse)
print("R2 Score:", r2)
print("RMSE (Cross-Validated)", rmse)

new_row = {"Model": "ElasticNet", "MAE": mae, "MSE": mse, "R2 Score": r2, "RMSE (Cross-Validated)": rmse}
models = models.append(new_row, ignore_index=True)

<h3>Polynomial Regression (degree=2)</h3>

In [1]:
poly_reg = PolynomialFeatures(degree=2)
X_train_2d = poly_reg.fit_transform(X_train)
X_test_2d = poly_reg.transform(X_test)

poly_reg = LinearRegression()
poly_reg.fit(X_train_2d, y_train)
predictions = poly_reg.predict(X_test_2d)

mae, mse, r2 = evaluate(y_test, predictions)
rmse = rmse_cv(poly_reg)
print("MAE:", mae)
print("MSE:", mse)
print("R2 Score:", r2)
print("RMSE (Cross-Validated)", rmse)

new_row = {"Model": "PolynomialRegression(degree=2)", "MAE": mae, "MSE": mse, "R2 Score": r2, "RMSE (Cross-Validated)": rmse}
models = models.append(new_row, ignore_index=True)

<h3>LightGBM Regressor</h3>

In [1]:
lgbm = LGBMRegressor(random_state=42)
lgbm.fit(X_train, y_train)
predictions = lgbm.predict(X_test)

mae, mse, r2 = evaluate(y_test, predictions)
rmse = rmse_cv(lgbm)
print("MAE:", mae)
print("MSE:", mse)
print("R2 Score:", r2)
print("RMSE (Cross-Validated)", rmse)

new_row = {"Model": "LGBMRegressor", "MAE": mae, "MSE": mse, "R2 Score": r2, "RMSE (Cross-Validated)": rmse}
models = models.append(new_row, ignore_index=True)

In [1]:
models.sort_values(by="RMSE (Cross-Validated)")

In [1]:
plt.figure(figsize=(12,8))
sns.barplot(x=models["Model"], y=models["RMSE (Cross-Validated)"])
plt.title("Models' Cross Validated RMSE Scores", size=15)
plt.xticks(rotation=30)
plt.show()

# Hyperparameter Tuning

In [1]:
tuned_models = pd.DataFrame(columns=["Model", "MAE", "MSE", "R2 Score", "RMSE (Cross-Validated)"])

<h3>Tuning the Lasso</h3>

In [1]:
param_grid_lasso = {"alpha": [0.0001, 0.001, 0.01, 0.1, 1, 10],
                    "random_state": [42]}

grid_lasso = GridSearchCV(Lasso(), param_grid_lasso, scoring="neg_root_mean_squared_error", cv=5, verbose=0, n_jobs=-1)

grid_lasso.fit(X_train, y_train)

In [1]:
lasso_params = grid_lasso.best_params_

lasso = Lasso(**lasso_params)
lasso.fit(X_train, y_train)
predictions = lasso.predict(X_test)

mae, mse, r2 = evaluate(y_test, predictions)
rmse = rmse_cv(lasso)
print("MAE:", mae)
print("MSE:", mse)
print("R2 Score:", r2)
print("RMSE (Cross-Validated)", rmse)

new_row = {"Model": "Lasso", "MAE": mae, "MSE": mse, "R2 Score": r2, "RMSE (Cross-Validated)": rmse}
tuned_models = tuned_models.append(new_row, ignore_index=True)

<h3>Tuning the Ridge</h3>

In [1]:
param_grid_ridge = {"alpha": [0.0001, 0.001, 0.01, 0.1, 1, 10],
                    "random_state": [42]}

grid_ridge = GridSearchCV(Ridge(), param_grid_ridge, scoring="neg_root_mean_squared_error", cv=5, verbose=0, n_jobs=-1)

grid_ridge.fit(X_train, y_train)

In [1]:
ridge_params = grid_ridge.best_params_

ridge = Ridge(**ridge_params)
ridge.fit(X_train, y_train)
predictions = ridge.predict(X_test)

mae, mse, r2 = evaluate(y_test, predictions)
rmse = rmse_cv(ridge)
print("MAE:", mae)
print("MSE:", mse)
print("R2 Score:", r2)
print("RMSE (Cross-Validated)", rmse)

new_row = {"Model": "Ridge", "MAE": mae, "MSE": mse, "R2 Score": r2, "RMSE (Cross-Validated)": rmse}
tuned_models = tuned_models.append(new_row, ignore_index=True)

<h3>Tuning the Elastic Net</h3>

In [1]:
param_grid_elasticnet = {"alpha": [0.0001, 0.001, 0.01, 0.1, 1, 10],
                         "l1_ratio": np.arange(0, 1, 0.05), 
                         "random_state": [42]}

grid_elasticnet = GridSearchCV(ElasticNet(), param_grid_elasticnet, scoring="neg_root_mean_squared_error", cv=5, verbose=0, n_jobs=-1)

grid_elasticnet.fit(X_train, y_train)

In [1]:
elasticnet_params = grid_elasticnet.best_params_

elastic_net = ElasticNet(**elasticnet_params)
elastic_net.fit(X_train, y_train)
predictions = elastic_net.predict(X_test)

mae, mse, r2 = evaluate(y_test, predictions)
rmse = rmse_cv(elastic_net)
print("MAE:", mae)
print("MSE:", mse)
print("R2 Score:", r2)
print("RMSE (Cross-Validated)", rmse)

new_row = {"Model": "ElasticNet", "MAE": mae, "MSE": mse, "R2 Score": r2, "RMSE (Cross-Validated)": rmse}
tuned_models = tuned_models.append(new_row, ignore_index=True)

<h3>Tuning the LightGBM Regressor</h3>

In [1]:
param_grid_lgbm = {"num_leaves": [2, 3, 5, 7],
                   "learning_rate": [0.01, 0.05],
                   "n_estimators": [200, 500, 1000, 5000],
                   "max_bin": [100, 150, 200],
                   "random_state": [42]}

grid_lgbm = GridSearchCV(LGBMRegressor(), param_grid_lgbm, scoring="neg_root_mean_squared_error", cv=5, verbose=0, n_jobs=-1)

grid_lgbm.fit(X_train, y_train)

In [1]:
lgbm_params = grid_lgbm.best_params_

lgbm = LGBMRegressor(**lgbm_params)
lgbm.fit(X_train, y_train)
predictions = lgbm.predict(X_test)

mae, mse, r2 = evaluate(y_test, predictions)
rmse = rmse_cv(lgbm)
print("MAE:", mae)
print("MSE:", mse)
print("R2 Score:", r2)
print("RMSE (Cross-Validated)", rmse)

new_row = {"Model": "LGBMRegressor", "MAE": mae, "MSE": mse, "R2 Score": r2, "RMSE (Cross-Validated)": rmse}
tuned_models = tuned_models.append(new_row, ignore_index=True)

# Model Comparison

In [1]:
tuned_models.sort_values(by="RMSE (Cross-Validated)")

In [1]:
plt.figure(figsize=(12,8))
sns.barplot(x=tuned_models["Model"], y=tuned_models["RMSE (Cross-Validated)"])
plt.title("Models' Cross Validated RMSE Scores After Hyperparameter Tuning", size=15)
plt.xticks(rotation=30)
plt.show()

# Conclusion

<h3>Since the Lasso model is yielding relatively less RMSE score after Hyperparameter Tuning, the winner in this comparison is Lasso (L1 Regularization).</h3>

<h1 style="font-family: Times New Roman;">Thank you so much for reading notebook. Preparing notebooks are taking a great deal of time. If you liked it, please do not forget to give upvote. Peace Out ✌️ ...</h1>