## Artificial Neural Network
Context: this dataset is created for prediction of Graduate Admissions from an Indian perspective.
Content: the dataset contains several parameters which are considered important during the application for Masters Programs. The parameters included are : 1. GRE Scores ( out of 340 ) 2. TOEFL Scores ( out of 120 ) 3. University Rating ( out of 5 ) 4. Statement of Purpose and Letter of Recommendation Strength ( out of 5 ) 5. Undergraduate GPA ( out of 10 ) 6. Research Experience ( either 0 or 1 ) 7. Chance of Admit ( ranging from 0 to 1 )

Reference: Mohan S Acharya, Asfia Armaan, Aneeta S Antony : A Comparison of Regression Models for Prediction of Graduate Admissions, IEEE International Conference on Computational Intelligence in Data Science 2019

In [0]:
# Import Librairies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### 1) Download the dataset from this link https://www.kaggle.com/mohansacharya/graduate-admissions and put the csv file in the same directory as the current notebook.

In [4]:
df = pd.read_csv("./Admission_Predict.csv")
df.head()

Unnamed: 0,Serial No.,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
0,1,337,118,4,4.5,4.5,9.65,1,0.92
1,2,324,107,4,4.0,4.5,8.87,1,0.76
2,3,316,104,3,3.0,3.5,8.0,1,0.72
3,4,322,110,3,3.5,2.5,8.67,1,0.8
4,5,314,103,2,2.0,3.0,8.21,0,0.65


### 2) Find the size of the dataset (the number of rows and columns)

In [5]:
df.shape

(400, 9)

### 3) df[['A', 'Z']] : returns all the rows and only column A and Z. We call X the input composed of the following columns of the dataset: ["GRE Score", "TOEFL Score", "University Rating", "SOP", "LOR ", "CGPA", "Research"] and y the output (target) composed of one column ["Chance of Admit "]. Define X and y. (You can also try slicing, for example df.iloc[:, 2:4])

In [6]:
X = df[["GRE Score", "TOEFL Score", "University Rating", "SOP", "LOR ", "CGPA", "Research"]] 
y = df[["Chance of Admit "]]

y.head()

Unnamed: 0,Chance of Admit
0,0.92
1,0.76
2,0.72
3,0.8
4,0.65


### 4) X and y are DataFrames, to be used by numpy we have to transform them into arrays: transform X and y into arrays using ".values"

In [7]:
X = X.values
y = y.values

type(y)

numpy.ndarray

In [8]:
y.T[:,:5]

array([[0.92, 0.76, 0.72, 0.8 , 0.65]])

### 5) Split data into training and testing datas, the training data is used to train the model, and the testing data is used to test: follow this link that will be helpful for you to split the data. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

In [0]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### 6) Using the "print" function, display all the splitting data: X_train, X_test, y_train, y_test.

In [10]:
print("X_train :\n", X_train) 
print("\nX_test :\n", X_test)
print("\ny_train :\n", y_train.T) 
print("\ny_test :\n", y_test.T)

X_train :
 [[322.   110.     3.   ...   2.5    8.67   1.  ]
 [318.   110.     3.   ...   3.     8.8    0.  ]
 [340.   120.     5.   ...   4.5    9.91   1.  ]
 ...
 [306.   105.     2.   ...   3.     8.22   1.  ]
 [302.    99.     1.   ...   2.     7.25   0.  ]
 [314.   106.     2.   ...   3.5    8.25   0.  ]]

X_test :
 [[301.   104.     3.     3.5    4.     8.12   1.  ]
 [311.   102.     3.     4.5    4.     8.64   1.  ]
 [340.   114.     5.     4.     4.     9.6    1.  ]
 [325.   108.     4.     4.5    4.     9.06   1.  ]
 [301.    97.     2.     3.     3.     7.88   1.  ]
 [340.   115.     5.     4.5    4.5    9.45   1.  ]
 [297.    96.     2.     2.5    1.5    7.89   0.  ]
 [303.    99.     3.     2.     2.5    7.66   0.  ]
 [312.   105.     2.     2.     2.5    8.45   0.  ]
 [323.   113.     3.     4.     3.     9.32   1.  ]
 [323.   108.     3.     3.5    3.     8.6    0.  ]
 [334.   116.     4.     4.     3.5    9.54   1.  ]
 [316.   102.     3.     2.     3.     7.4    0.  ]
 [

#  Feature scaling
####  7) Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data preprocessing step. https://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html. 
#### Is it useful or not to normalize your features? make normalization on your data

In [0]:
# Feature Scaling
from sklearn.preprocessing import StandardScaler

std_scaler = StandardScaler()
X_train = std_scaler.fit_transform(X_train)
X_test = std_scaler.transform(X_test)

In [12]:
print("\t\tX_train & X_test after standard scaling")
print("X_train :\n", X_train) 
print("\nX_test :\n", X_test)

		X_train & X_test after standard scaling
X_train :
 [[ 0.45711129  0.42466178 -0.057308   ... -1.05965163  0.13986648
   0.92761259]
 [ 0.1022887   0.42466178 -0.057308   ... -0.50194025  0.36110014
  -1.07803625]
 [ 2.05381293  2.08593034  1.6892215  ...  1.17119391  2.25009529
   0.92761259]
 ...
 [-0.96217907 -0.40597251 -0.93057275 ... -0.50194025 -0.62594237
   0.92761259]
 [-1.31700165 -1.40273364 -1.8038375  ... -1.61736302 -2.27668588
  -1.07803625]
 [-0.25253389 -0.23984565 -0.93057275 ...  0.05577114 -0.57488845
  -1.07803625]]

X_test :
 [[-1.4057073  -0.57209936 -0.057308    0.12715607  0.61348253 -0.79612211
   0.92761259]
 [-0.51865083 -0.90435307 -0.057308    1.10763663  0.61348253  0.08881255
   0.92761259]
 [ 2.05381293  1.0891692   1.6892215   0.61739635  0.61348253  1.72253809
   0.92761259]
 [ 0.72322823  0.09240806  0.81595675  1.10763663  0.61348253  0.80356748
   0.92761259]
 [-1.4057073  -1.73498736 -0.93057275 -0.36308421 -0.50194025 -1.2045535
   0.92761259]


## Make your first Simple Neural Network also known as Perceptron
### 8) The Network is initialized, since we want to construct a very simple Network: following this link https://keras.io/. We will construct hidden layer(In this case 2 hidden layers are sufficients). For documentation about creating  a sequential model please see the tutorial in this link 

### 9) Compile the model using: model.compile()

In [13]:
# Import Librairies
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation

Using TensorFlow backend.


In [14]:
# 9-Building a model
#Initialization of Network
model = Sequential()

#defining layers
model.add(Dense(16, input_dim = 7))
model.add(Activation('relu'))        # First layer
model.add(Dense(1))
model.add(Activation('sigmoid'))      # Output layer

# Compile the model
model.compile(loss='mean_squared_error', optimizer=keras.optimizers.Adam(lr=0.01), metrics=['mse'])
model.summary()





Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 16)                128       
_________________________________________________________________
activation_1 (Activation)    (None, 16)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 17        
_________________________________________________________________
activation_2 (Activation)    (None, 1)                 0         
Total params: 145
Trainable params: 145
Non-trainable params: 0
_________________________________________________________________


## Fitting model
#### A model that is well-fitted produces more accurate outcomes, a model that is overfitted matches the train data too closely (low train loss, but high test loss), and a model that is underfitted doesn’t match the train data closely enough (high train loss).
#### a) An epoch is an iteration over the entire x and y data provided
#### b) Batch size = Number of samples per gradient update. The higher the batch size, the more memory space you'll need.


### 10) Fit your trained model

In [15]:
#10- Fit the model
batch_size = 10
epochs = 100
model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=2)




Epoch 1/100





 - 1s - loss: 0.0229 - mean_squared_error: 0.0229
Epoch 2/100
 - 0s - loss: 0.0067 - mean_squared_error: 0.0067
Epoch 3/100
 - 0s - loss: 0.0053 - mean_squared_error: 0.0053
Epoch 4/100
 - 0s - loss: 0.0045 - mean_squared_error: 0.0045
Epoch 5/100
 - 0s - loss: 0.0047 - mean_squared_error: 0.0047
Epoch 6/100
 - 0s - loss: 0.0044 - mean_squared_error: 0.0044
Epoch 7/100
 - 0s - loss: 0.0042 - mean_squared_error: 0.0042
Epoch 8/100
 - 0s - loss: 0.0040 - mean_squared_error: 0.0040
Epoch 9/100
 - 0s - loss: 0.0040 - mean_squared_error: 0.0040
Epoch 10/100
 - 0s - loss: 0.0042 - mean_squared_error: 0.0042
Epoch 11/100
 - 0s - loss: 0.0040 - mean_squared_error: 0.0040
Epoch 12/100
 - 0s - loss: 0.0039 - mean_squared_error: 0.0039
Epoch 13/100
 - 0s - loss: 0.0039 - mean_squared_error: 0.0039
Epoch 14/100
 - 0s - loss: 0.0039 - mean_squared_error: 0.0039
Epoch 15/100
 - 0s - loss: 0.0038 - mean_squared_error: 0.0038
Epoch 16/100
 - 0s - loss: 0.0040 - mean_squared_error: 

<keras.callbacks.History at 0x7f658ffde358>

### 11) Make a prediction

In [16]:
pred_loss, pred_mse = model.evaluate(X_test, y_test)

print("\npred_mse : ", pred_mse)


pred_mse :  0.00758865752723068


### 12)Predict if the student with the following informations will be admitted:
#### GRE Score: 220,   TOEFL Score: 100, University rating: 2,  SOP: 3.5, LOR: 3, CGPA: 8, Research: 0



In [17]:
new_input = np.array([[220, 100, 2, 3.5, 3, 8, 0]])
y_pred = model.predict(std_scaler.transform(new_input))

y_pred

array([[0.52405584]], dtype=float32)