<a href="https://colab.research.google.com/github/madiha-ahmed-chowdhury/Admission_Prediction_Model1/blob/main/Admission_Prediction_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Importing Libraries**

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from jupyterthemes import jtplot
jtplot.style(theme='monokai', context='notebook', ticks=True, grid=False)

#**Importing Dataset**

In [None]:
admission_df=pd.read_csv('/content/Admission_Predict (1).csv')

In [None]:
admission_df.head()

**Dropping Serial No.**

In [None]:
admission_df.drop('Serial No.', axis=1 , inplace=True)
admission_df.head()

#**Exploratory Data Analysis**
In the following block of code, we:

1.   check for null values
2.   study the data frame using .info() and .describe()
3.   Extract meaningful insights from these statistics


In [None]:
admission_df.isnull().sum()

In [None]:
admission_df.info()

In [None]:
# Statistical summary of the dataframe
admission_df.describe()

In [None]:
# Grouping by University ranking
df_university = admission_df.groupby(by = 'University Rating').mean()
df_university

From the above statistics, we can clearly see :


1.   The mean for GRE score is 316
2.   The mean for TOEFL score is 107
3.   Standard deviation for the GRE score is 11, which suggests that about 68% of the students score between 305 and 327
4.   We also see that the average University Ranking is 3


#**Data Visalization**

In [None]:
admission_df.hist(bins = 30, figsize = (20,20),color = 'r')

In [None]:
sns.pairplot(admission_df)

In [None]:
corr_matrix = admission_df.corr()
plt.figure(figsize = (12,12))
sns.heatmap(corr_matrix,annot = True)
plt.show()

From the above graphs, we can observe that :


1.   There is a very high correlation between GRE and TOEFL scores. A student who scores a high GRE score tends to score a similar high TOEFL score
2.   The chances of admission acceptance increase as GPA, SOP and University Ranking improve/increase
3.   Students who have research experience in the past, tend to have a higher change of acceptance to a university





#**Creation of training and testing dataset**
In the following block of code, we divide the original dataset into training and testing datasets respectively

In [None]:
admission_df.columns

In [None]:
X = admission_df.drop(columns = ['Chance of Admit']) # Added a space to match the column name in the dataframe

In [None]:
y = admission_df['Chance of Admit']

In [None]:
X.shape

In [None]:
y.shape

In [None]:
y

In [None]:
X = np.array(X)
y = np.array(y)

In [None]:
y = y.reshape(-1,1)
y.shape

In [None]:
# scaling the data before training the model
#df consists of diff features and their ranges are variable for these features, hence, we need scaling
#This is done to avoid biasing while predicting the y variable
from sklearn.preprocessing import StandardScaler,MinMaxScaler
scaler_x = StandardScaler()
X = scaler_x.fit_transform(X)

In [None]:
scaler_y = StandardScaler()
y = scaler_y.fit_transform(y)

In [None]:
# splitting the data in to test and train sets
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.15)

#**Linear Regression Model**
In the following block of code, I have implemented a multiple linear regression model using least sum of squares to find the best fit line

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, accuracy_score

In [None]:
LinearRegression_model = LinearRegression()
LinearRegression_model.fit(X_train,y_train)

In [None]:
accuracy_LinearRegression = LinearRegression_model.score(X_test,y_test)
accuracy_LinearRegression

#**Artificial Neural Networks**
In the following block of code, I have implemented a neural network model with 4 deep layers and 1 output layer. Additionally, I have also implemented Dropout regularization in order to minimize the dependence between the different neurons in each layer.

I have implemented this model using Keras library on top of Tensorflow

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.optimizers import Adam

In [None]:
ANN_model = keras.Sequential()
ANN_model.add(Dense(50, input_dim = 7))
ANN_model.add(Activation('relu'))
ANN_model.add(Dense(150))
ANN_model.add(Activation('relu'))
ANN_model.add(Dropout(0.5))
ANN_model.add(Dense(150))
ANN_model.add(Activation('relu'))
ANN_model.add(Dropout(0.5))
ANN_model.add(Dense(50))
ANN_model.add(Activation('linear'))
ANN_model.add(Dense(1))
ANN_model.compile(loss = 'mse', optimizer = 'adam')
ANN_model.summary()

In [None]:
ANN_model.compile(optimizer='Adam', loss='mean_squared_error')

In [None]:
epochs_hist = ANN_model.fit(X_train, y_train, epochs = 100, batch_size = 20, validation_split = 0.2)

In [None]:
result = ANN_model.evaluate(X_test, y_test)
accuracy_ANN = 1 - result
print("Accuracy : {}".format(accuracy_ANN))

In [None]:
epochs_hist.history.keys()

In [None]:
plt.plot(epochs_hist.history['loss'])
plt.title('Model Loss Progress During Training')
plt.xlabel('Epoch')
plt.ylabel('Training Loss')
plt.legend(['Training Loss'])

From the above graph, we can see that as the number of epochs increase, the training loss decreases.

#**Decision Tree and Random Forest Models**
In the following block of code, I have implemented regression decision tree and random forest models using sklearn library functions

In [None]:
# Decision tree builds regression or classification models in the form of a tree structure.
# Decision tree breaks down a dataset into smaller subsets while at the same time an associated decision tree is incrementally developed.
# The final result is a tree with decision nodes and leaf nodes.
# Great resource: https://www.saedsayad.com/decision_tree_reg.htm

from sklearn.tree import DecisionTreeRegressor
DecisionTree_model = DecisionTreeRegressor()
DecisionTree_model.fit(X_train, y_train)

In [None]:
accuracy_DecisionTree = DecisionTree_model.score(X_test,y_test)
accuracy_DecisionTree

Many decision trees make up a random forest model which is an ensemble model. Predictions made by each decision tree are averaged to get the prediction of random forest model.
A random forest regressor fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

In [None]:
from sklearn.ensemble import RandomForestRegressor
RandomForest_model = RandomForestRegressor(n_estimators = 100, max_depth = 10)
RandomForest_model.fit(X_train,y_train)

In [None]:
accuracy_RandomForest = RandomForest_model.score(X_test,y_test)
accuracy_RandomForest

#**Regression KPIs**
Upon model fitting, I have assessed the performance of each model by comparing their predictions to the true labels.

In [None]:
y_predict = LinearRegression_model.predict(X_test)
plt.plot(y_test, y_predict, '^', color = 'r')

In [None]:
#After Inverse Transformation(change in scale values)
y_predict_orig = scaler_y.inverse_transform(y_predict)
y_test_orig = scaler_y.inverse_transform(y_test)

In [None]:
plt.plot(y_test_orig,y_predict_orig,"^", color='r')

In [None]:
k = X_test.shape[1]
n = len(X_test)
n

In [None]:
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from math import sqrt

# Calculate RMSE (Root Mean Squared Error)
RMSE = float(format(np.sqrt(mean_squared_error(y_test_orig, y_predict_orig)), '.3f'))

# Calculate MSE (Mean Squared Error)
MSE = mean_squared_error(y_test_orig, y_predict_orig)

# Calculate MAE (Mean Absolute Error)
MAE = mean_absolute_error(y_test_orig, y_predict_orig)

# Calculate R² score (Coefficient of Determination)
r2 = r2_score(y_test_orig, y_predict_orig)

# Calculate Adjusted R² score
adj_r2 = 1 - (1 - r2) * (n - 1) / (n - k - 1)

# Print the results
print('RMSE =', RMSE, '\nMSE =', MSE, '\nMAE =', MAE, '\nR2 =', r2, '\nAdjusted R2 =', adj_r2)
