# Predicting Student Admissions with Neural Networks
In this notebook, we predict student admissions to graduate school at UCLA based on three pieces of data:
- GRE Scores (Test)
- GPA Scores (Grades)
- Class rank (1-4)

The dataset originally came from here: http://www.ats.ucla.edu/

## Loading the data
To load the data and format it nicely, we will use two very useful packages called Pandas and Numpy. You can read on the documentation here:
- https://pandas.pydata.org/pandas-docs/stable/
- https://docs.scipy.org/

In [29]:
# Importing pandas and numpy
import pandas as pd
import numpy as np

# Reading the csv file into a pandas DataFrame
data = pd.read_csv('student_data.csv')

# Printing out the first 10 rows of our data
data.head()

Unnamed: 0,admit,gre,gpa,rank
0,0,380,3.61,3
1,1,660,3.67,3
2,1,800,4.0,1
3,1,640,3.19,4
4,0,520,2.93,4


In [30]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   admit   400 non-null    int64  
 1   gre     400 non-null    int64  
 2   gpa     400 non-null    float64
 3   rank    400 non-null    int64  
dtypes: float64(1), int64(3)
memory usage: 12.6 KB


In [31]:
data.describe()

Unnamed: 0,admit,gre,gpa,rank
count,400.0,400.0,400.0,400.0
mean,0.3175,587.7,3.3899,2.485
std,0.466087,115.516536,0.380567,0.94446
min,0.0,220.0,2.26,1.0
25%,0.0,520.0,3.13,2.0
50%,0.0,580.0,3.395,2.0
75%,1.0,660.0,3.67,3.0
max,1.0,800.0,4.0,4.0


## Plotting the data

First let's make a plot of our data to see how it looks. In order to have a 2D plot, let's ingore the rank.

In [32]:
# %matplotlib inline
import matplotlib.pyplot as plt

# Function to help us plot
def plot_points(data):
    X = np.array(data[["gre","gpa"]])
    y = np.array(data["admit"])
    admitted = X[np.argwhere(y==1)]
    rejected = X[np.argwhere(y==0)]
    plt.scatter([s[0][0] for s in rejected], [s[0][1] for s in rejected], s = 25, color = 'red', edgecolor = 'k')
    plt.scatter([s[0][0] for s in admitted], [s[0][1] for s in admitted], s = 25, color = 'cyan', edgecolor = 'k')
    plt.xlabel('Test (GRE)')
    plt.ylabel('Grades (GPA)')
    
# Plotting the points
plot_points(data)
plt.show()

Roughly, it looks like the students with high scores in the grades and test passed, while the ones with low scores didn't, but the data is not as nicely separable as we hoped it would. Maybe it would help to take the rank into account? Let's make 4 plots, each one for each rank.

In [33]:
# Separating the ranks
data_rank1 = data[data["rank"]==1]
data_rank2 = data[data["rank"]==2]
data_rank3 = data[data["rank"]==3]
data_rank4 = data[data["rank"]==4]

# Plotting the graphs
plot_points(data_rank1)
plt.title("Rank 1")
plt.show()
plot_points(data_rank2)
plt.title("Rank 2")
plt.show()
plot_points(data_rank3)
plt.title("Rank 3")
plt.show()
plot_points(data_rank4)
plt.title("Rank 4")
plt.show()

This looks more promising, as it seems that the lower the rank, the higher the acceptance rate. Let's use the rank as one of our inputs. In order to do this, we should one-hot encode it.

## TODO: One-hot encoding the rank
Use the `get_dummies` function in pandas in order to one-hot encode the data.

Hint: To drop a column, it's suggested that you use `one_hot_data`[.drop( )](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html).

In [34]:
# TODO:  Make dummy variables for rank and concat existing columns
rank_dummies = pd.get_dummies(data['rank'], prefix='rank')
one_hot_data = data.join(rank_dummies)

# TODO: Drop the previous rank column
one_hot_data = one_hot_data.drop(columns='rank')

# Print the first 10 rows of our data
one_hot_data[:10]

Unnamed: 0,admit,gre,gpa,rank_1,rank_2,rank_3,rank_4
0,0,380,3.61,0,0,1,0
1,1,660,3.67,0,0,1,0
2,1,800,4.0,1,0,0,0
3,1,640,3.19,0,0,0,1
4,0,520,2.93,0,0,0,1
5,1,760,3.0,0,1,0,0
6,1,560,2.98,1,0,0,0
7,0,400,3.08,0,1,0,0
8,1,540,3.39,0,0,1,0
9,0,700,3.92,0,1,0,0


## TODO: Scaling the data
The next step is to scale the data. We notice that the range for grades is 1.0-4.0, whereas the range for test scores is roughly 200-800, which is much larger. This means our data is skewed, and that makes it hard for a neural network to handle. Let's fit our two features into a range of 0-1, by dividing the grades by 4.0, and the test score by 800.

In [35]:
# Making a copy of our data
processed_data = one_hot_data[:]

# TODO: Scale the columns
processed_data['gre'] = processed_data['gre'] / 800
processed_data['gpa'] = processed_data['gpa'] / 4

# Printing the first 10 rows of our procesed data
processed_data[:10]

Unnamed: 0,admit,gre,gpa,rank_1,rank_2,rank_3,rank_4
0,0,0.475,0.9025,0,0,1,0
1,1,0.825,0.9175,0,0,1,0
2,1,1.0,1.0,1,0,0,0
3,1,0.8,0.7975,0,0,0,1
4,0,0.65,0.7325,0,0,0,1
5,1,0.95,0.75,0,1,0,0
6,1,0.7,0.745,1,0,0,0
7,0,0.5,0.77,0,1,0,0
8,1,0.675,0.8475,0,0,1,0
9,0,0.875,0.98,0,1,0,0


## Splitting the data into Training and Testing

In order to test our algorithm, we'll split the data into a Training and a Testing set. The size of the testing set will be 10% of the total data.

In [36]:
sample = np.random.choice(processed_data.index, size=int(len(processed_data)*0.9), replace=False)
train_data, test_data = processed_data.iloc[sample], processed_data.drop(sample)

print("Number of training samples is", len(train_data))
print("Number of testing samples is", len(test_data))
print(train_data[:10])
print(test_data[:10])

Number of training samples is 360
Number of testing samples is 40
     admit    gre     gpa  rank_1  rank_2  rank_3  rank_4
172      0  0.850  0.8700       0       0       1       0
137      0  0.875  1.0000       0       0       1       0
126      1  0.750  0.8850       1       0       0       0
94       1  0.825  0.8600       0       1       0       0
72       0  0.600  0.8475       0       0       0       1
33       1  1.000  1.0000       0       0       1       0
380      0  0.875  0.9125       0       1       0       0
223      0  1.000  0.8675       0       0       1       0
307      0  0.725  0.8775       0       1       0       0
227      0  0.675  0.7550       0       0       0       1
     admit    gre     gpa  rank_1  rank_2  rank_3  rank_4
20       0  0.625  0.7925       0       0       1       0
21       1  0.825  0.9075       0       1       0       0
48       0  0.550  0.6200       0       0       0       1
50       0  0.800  0.9650       0       0       1       0
54    

## Splitting the data into features and targets (labels)
Now, as a final step before the training, we'll split the data into features (X) and targets (y).

In [37]:
features = train_data.drop('admit', axis=1)
targets = train_data['admit']
features_test = test_data.drop('admit', axis=1)
targets_test = test_data['admit']

print(features[:10])
print(targets[:10])

       gre     gpa  rank_1  rank_2  rank_3  rank_4
172  0.850  0.8700       0       0       1       0
137  0.875  1.0000       0       0       1       0
126  0.750  0.8850       1       0       0       0
94   0.825  0.8600       0       1       0       0
72   0.600  0.8475       0       0       0       1
33   1.000  1.0000       0       0       1       0
380  0.875  0.9125       0       1       0       0
223  1.000  0.8675       0       0       1       0
307  0.725  0.8775       0       1       0       0
227  0.675  0.7550       0       0       0       1
172    0
137    0
126    1
94     1
72     0
33     1
380    0
223    0
307    0
227    0
Name: admit, dtype: int64


## Training the 1-layer Neural Network
The following function trains the 1-layer neural network.  
First, we'll write some helper functions.

In [38]:
# Activation (sigmoid) function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
def sigmoid_prime(x):
    return sigmoid(x) * (1-sigmoid(x))

# log loss error formula
def error_formula(y, output):
    return - y*np.log(output) - (1 - y) * np.log(1-output)

# TODO: Backpropagate the error
Now it's your turn to shine. Write the error term. Remember that this is given by the equation $$ (y-\hat{y})x $$ for binary cross entropy loss function and 
$$ (y-\hat{y})\sigma'(x)x $$ for mean square error. 

*Yo: Creo q para MSE debería ser:* $$ (y-\hat{y})\sigma'(h)x $$

*Yo 2: En realidad los error terms no deberían tener la x, según yo*

NOTA: Según entendí, el error term no involucra la x.

In [39]:
# TODO: Write the error term formula for LOG LOSS
def error_term_formula(x, y, output):
    return (y - output) * x

In [50]:
# Neural Network hyperparameters
epochs = 1000
# learnrate = 0.0001
learnrate = 0.01


In [51]:
def train_nn(features, targets, epochs, learnrate, error_term_formula, error_formula):
    '''
    Training function
    
    :param features: size(m,n)
    :param targets: size(m,)
    :param epochs: int
    :param learnrate: float
    :param error_term_formula: func(x, y, output)
    :param error_formula: func(y, output)
    :return: weights size(n,)
    '''
    
    # Use to same seed to make debugging easier
    np.random.seed(42)

    n_records, n_features = features.shape
    last_loss = None

    # Initialize weights
    weights = np.random.normal(scale=1 / n_features**.5, size=n_features)

    for e in range(epochs):
        del_w = np.zeros(weights.shape)
        for x, y in zip(features, targets):
            # Loop through all records, x is the input, y is the target

            # Activation of the output unit
            #   Notice we multiply the inputs and the weights here 
            #   rather than storing h as a separate variable 
            output = sigmoid(np.dot(x, weights))

            # The error term
            error_term = error_term_formula(x, y, output)

            # The gradient descent step, the error times the gradient times the inputs
            del_w += error_term
            ##weights += learnrate * error_term
            

        # Update the weights here. The learning rate times the 
        # change in weights
        # don't have to divide by n_records since it is compensated by the learning rate
        weights += learnrate * del_w / n_records  

        # Printing out the mean square error on the training set
        if e % (epochs / 10) == 0:
            out = sigmoid(np.dot(features, weights))
            loss = np.mean(error_formula(targets, out))
            print("Epoch:", e)
            if last_loss and last_loss < loss:
                print("Train loss: ", loss, "  WARNING - Loss Increasing")
            else:
                print("Train loss: ", loss)
            last_loss = loss
            print("=========")
    print("Finished training!")
    return weights


In [52]:
# Train with LOG LOSS
weights = train_nn(features.to_numpy(), targets.to_numpy(), epochs, learnrate, error_term_formula, error_formula)


Epoch: 0
Train loss:  0.7484594807897331
Epoch: 100
Train loss:  0.6822223118434836
Epoch: 200
Train loss:  0.6504986565042005
Epoch: 300
Train loss:  0.6349416381190033
Epoch: 400
Train loss:  0.6269223747121504
Epoch: 500
Train loss:  0.6224764463153829
Epoch: 600
Train loss:  0.6197691795949172
Epoch: 700
Train loss:  0.6179372566980591
Epoch: 800
Train loss:  0.6165668293386073
Epoch: 900
Train loss:  0.6154556764330879
Finished training!


In [53]:
# Train with MSE
def error_term_formula_mse(x, y, output):    
    return (y - output) * output * (1 - output) * x
    
def error_formula_mse(y, output):
    # return np.mean((output - y) ** 2)
    return (output - y) ** 2

weights = train_nn(features.to_numpy(), targets.to_numpy(), epochs, learnrate, error_term_formula_mse, error_formula_mse)


Epoch: 0
Train loss:  0.276566848699054
Epoch: 100
Train loss:  0.2668909245580549
Epoch: 200
Train loss:  0.25859575393181344
Epoch: 300
Train loss:  0.25154929027824297
Epoch: 400
Train loss:  0.24560435077011936
Epoch: 500
Train loss:  0.24061255553365477
Epoch: 600
Train loss:  0.2364334876612602
Epoch: 700
Train loss:  0.23293985824626853
Epoch: 800
Train loss:  0.23001972655101188
Epoch: 900
Train loss:  0.22757676757226153
Finished training!


In [47]:
def train_nn_vectorial(features, targets, epochs, learnrate, error_term_formula, error_formula):
    '''
    Training function, vectorial implementation
    
    :param features: size(m,n)
    :param targets: size(m,)
    :param epochs: int
    :param learnrate: float
    :param error_term_formula: func(x, y, output)
    :param error_formula: func(y, output)
    :return: weights size(n,)
    '''
    
    # Use to same seed to make debugging easier
    np.random.seed(42)

    n_records, n_features = features.shape
    last_loss = None

    # Initialize weights
    weights = np.random.normal(scale=1 / n_features**.5, size=n_features)

    for e in range(epochs):        
        # Activation of the output unit
        #   Notice we multiply the inputs and the weights here 
        #   rather than storing h as a separate variable 
        # s(n,)
        outputs = sigmoid(np.dot(features, weights))        
        # The gradient descent step, the error times the gradient times the inputs        
        # (n, m)        
        error_terms = error_term_formula(features, targets, outputs)
        
        # (m, ) Sum of all entries' errors
        del_w = error_terms.sum(axis=0)
                       
        # Update the weights here. The learning rate times the 
        # change in weights
        # don't have to divide by n_records since it is compensated by the learning rate        
        weights += learnrate * del_w / n_records  

        # Printing out the mean square error on the training set
        if e % (epochs / 10) == 0:            
            loss = np.mean(error_formula(targets, outputs))
            print("Epoch:", e)
            if last_loss and last_loss < loss:
                print("Train loss: ", loss, "  WARNING - Loss Increasing")
            else:
                print("Train loss: ", loss)
            last_loss = loss
            print("=========")
    print("Finished training!")
    return weights
   

In [48]:
weights = train_nn(features.to_numpy(), targets.to_numpy(), epochs, learnrate, error_term_formula, error_formula)

Epoch: 0
Train loss:  0.7493887438467536
Epoch: 100
Train loss:  0.7484518251963341
Epoch: 200
Train loss:  0.747521680408004
Epoch: 300
Train loss:  0.7465982641498909
Epoch: 400
Train loss:  0.7456815313032333
Epoch: 500
Train loss:  0.7447714369630561
Epoch: 600
Train loss:  0.7438679364388138
Epoch: 700
Train loss:  0.7429709852549953
Epoch: 800
Train loss:  0.7420805391516935
Epoch: 900
Train loss:  0.7411965540851397
Finished training!


In [49]:
weights = train_nn(features.to_numpy(), targets.to_numpy(), epochs, learnrate, error_term_formula_mse, error_formula_mse)

Epoch: 0
Train loss:  0.2766699107638816
Epoch: 100
Train loss:  0.27656584547570645
Epoch: 200
Train loss:  0.2764619284745197
Epoch: 300
Train loss:  0.2763581596838675
Epoch: 400
Train loss:  0.27625453902674174
Epoch: 500
Train loss:  0.2761510664255828
Epoch: 600
Train loss:  0.27604774180228203
Epoch: 700
Train loss:  0.2759445650781837
Epoch: 800
Train loss:  0.27584153617408774
Epoch: 900
Train loss:  0.27573865501025163
Finished training!


## Calculating the Accuracy on the Test Data

In [32]:
# Calculate accuracy on test data
test_out = sigmoid(np.dot(features_test, weights))
predictions = test_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

Prediction accuracy: 0.650
