# Week 18 Homework: Neural Networks

# Question 1.	What is a neural network? What are the general steps required to build a neural network? 

A neural network is a computer algorithm that is designed to mimic the way the human brain works.  It is comprised of a layers of nodes that are connected to make a network which is similar to neurons in the human brain.  Neural networks are used to solve complex problems and are made up of an input layer which holds the predictive features of the dataset, hidden layers which are where the computations are done so that the computer can find the relationship between the input features themselves and the output, and the output layer which is where the neural network's prediction is stored. A deep neural network (DNN) is a neural network with multiple hidden layers.  Each hidden layer is progressively more complex. 

To build a neural network, you need to create an input layer with the predictive features from the dataset.  Typically each node in the input layer is a different feature.  Next, you will create at least one hidden layer by specifying the amount of nodes. Then you need to assign weights between the input layer and the hidden layer.  The weights can be adjusted to make the prediction closer to the target.  You will also need to specify the activation function.  This can adjust the values in each node based on the calculation result in the node.  For example, ReLU is an activation function that makes the node value 0 if the input value from the previous node multiplied by the weight is a negative number, otherwise a positive result is unchanged.  Additionally, a loss function should be specified.  This is how the performance of the model is measured and it wil be different for a regression vs. a classification model.  For example, for a regression model, mean squared error is typically used for a loss function.  Finally, after the hidden layers are created, an output layer is created with connections (weights) between the last hidden layer and the output.  The output should represent the prediction that the model is making. 

# Question 2:	Generally, how do you check the performance of a neural network? Why? 

The performance of a neural network is assessed with use of a loss function. A loss function is a function that represents the error in the model, or how far away the prediction of the model is from the target.  The loss function should be different for a regression problem than for a classification problem.  For a regression model, mean squared error or mean absolute error are common for loss functions, while for a classification problem, a cross entropy loss function is common.  Additionally, for classification problems, you can print out the accuracy of the model to see the model's progress through iterations.  The goal of the model is to minimize the loss function (or in other words, you want to reduce the error in the predictions).  Ideally, you want to find where the slope of the loss function is zero (where the derivative is zero), because that would give you a minimum.  A way of doing this is gradient descent.  This is using a learning rate to slowing change the model parameters and see how the loss function changes.  If the loss decreases, then the model is headed in the correct direction.     

# Question 3.	Create a neural network using keras to predict the outcome of either of these datasets: 
Cardiac Arrhythmia: https://archive.ics.uci.edu/ml/datasets/Arrhythmia 
Abalone age: https://archive.ics.uci.edu/ml/datasets/Abalone


In [1]:
import csv

#import the data file and write out each row into a csv file
with open("abalone.data") as infile, open("abalone.csv", "w") as outfile:
    csv_writer = csv.writer(outfile)
    prev = ''
    csv_writer.writerow(['Sex', 'Length', 'Diameter', 'Height', 'Whole Weight', 'Shucked Weight', 'Viscera Weight', 'Shell Weight', 'Rings'])
    for line in infile:
        row = [field.strip() for field in line.split(',')]
        csv_writer.writerow(row)

In [32]:
import pandas as pd
import numpy as np

#load the abalone dataset from csv file and save as a pandas dataframe
abalone_df = pd.read_csv('./abalone.csv')
abalone_df.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole Weight,Shucked Weight,Viscera Weight,Shell Weight,Rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7


In [33]:
abalone_df['Rings'].describe()

count    4177.000000
mean        9.933684
std         3.224169
min         1.000000
25%         8.000000
50%         9.000000
75%        11.000000
max        29.000000
Name: Rings, dtype: float64

In [34]:
#Remove Outliers from the Dataset
# calculate summary statistics
data_mean, data_std = np.mean(abalone_df['Rings']), np.std(abalone_df['Rings'])
# identify outliers
cut_off = data_std * 3
lower, upper = data_mean - cut_off, data_mean + cut_off

In [35]:
print(lower)
print(upper)

0.26233526506932847
19.605033659996508


In [36]:
# identify outliers
outliers = [x for x in abalone_df['Rings'] if x < lower or x > upper]

In [37]:
print(outliers)

[20, 20, 21, 20, 20, 21, 22, 22, 22, 20, 26, 21, 23, 23, 22, 20, 20, 20, 20, 20, 21, 20, 22, 21, 21, 29, 23, 20, 20, 21, 21, 23, 22, 23, 20, 20, 20, 21, 27, 20, 21, 21, 25, 27, 20, 23, 23, 23, 21, 20, 23, 20, 20, 20, 24, 21, 20, 24, 20, 20, 21, 20]


In [40]:
# remove outliers
abalone_df = abalone_df[(abalone_df['Rings'] > lower) & (abalone_df['Rings'] < upper)]
abalone_df['Rings'].describe()

count    4115.000000
mean        9.758931
std         2.904193
min         1.000000
25%         8.000000
50%         9.000000
75%        11.000000
max        19.000000
Name: Rings, dtype: float64

In [59]:
#save the predictor variables into the dataframe X
X = abalone_df.drop('Rings', axis=1)
#save the independent variable y
y = abalone_df['Rings']

In [60]:
#Perform OneHotEncoding on only the 'Sex' Column to turn it into a numerical column instead of a categorical column. Drop the first column since it is repetitive data. 
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

ct = ColumnTransformer([("Sex", OneHotEncoder(drop='first'), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)

In [61]:
#Rename X and y to predictors and target for convention
predictors=X
target=y

In [76]:
#import necessary modules
from keras.layers import Dense
from keras.models import Sequential
from keras.callbacks import EarlyStopping
from keras.optimizers import SGD

In [63]:
#get the number of columns in the predictors array
n_cols = predictors.shape[1]
print(n_cols)

9


In [94]:
#instantiate the keras model
model=Sequential()

#add the layers 
model.add(Dense(200, activation='relu', input_shape=(n_cols,)))
model.add(Dense(200, activation='relu'))
model.add(Dense(200, activation='relu'))
model.add(Dense(200, activation='relu'))
model.add(Dense(1))

#compile the model
model.compile(optimizer=SGD(lr=0.001), loss='mean_squared_error')

#set an early stopping monitor so that the model will stop running if improvement to the loss function is not seen after a specified number of epochs
early_stopping_monitor = EarlyStopping(patience=4)

#fit the model
model.fit(predictors, target, validation_split=0.3, epochs=40, callbacks=[early_stopping_monitor])

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40


<tensorflow.python.keras.callbacks.History at 0x1ab28fed0d0>

|Hidden Layers| Nodes Per Layer| Optimizer | Learning Rate| Mean Squared Error|
|---|---|---|---|---|
|1 | 100 | Adam | NA | 4.0427|
|2 | 100 | Adam | NA | 3.8451|
|2 | 200 | Adam | NA | 3.9322|
|3 | 100 | Adam | NA | 3.7858|
|3 | 200 | Adam | NA | 3.7863|
|3 | 100 | SGD  | 0.01 | 4.2544 |
|3 | 100 | SGD  | 0.001 | 4.0569 |
|2 | 100 | SGD  | 0.001 | 4.0645 |
|2 | 200 | SGD  | 0.001 | 3.6489 |
|**3** | **200** | **SGD**  | **0.001** | **3.6176** |
|3 | 300 | SGD  | 0.001 | 4.0179 |
|4 | 200 | SGD  | 0.001 | 3.7209 |






In [91]:
rmse = np.sqrt(3.6176)

In [92]:
print(rmse)

1.9019989484749984


After tuning the neural network, the best optimizer was SGD with 3 hidden layers and 200 nodes per layer and a learning rate of 0.001.  This resulted in a mean squared error of 3.6176 or a rmse of 1.90.   

# Question 4.	Write another algorithm to predict the same result as the previous question using either KNN or logistic regression.

In [95]:
from sklearn.neighbors import KNeighborsClassifier

In [96]:
from sklearn.model_selection import train_test_split

#split the dataset into testing and training portions, with the testing portion making up 20% of the data.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

In [97]:
from sklearn.preprocessing import StandardScaler

#Standardize the dataset
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

In [135]:
knn=KNeighborsClassifier(n_neighbors=6)

In [136]:
knn.fit(X_train, y_train)

KNeighborsClassifier(n_neighbors=6)

In [137]:
y_pred = knn.predict(X_test)

In [138]:
from sklearn.metrics import accuracy_score
score = accuracy_score(y_test, y_pred)

In [139]:
print(score)

0.25273390036452004


# Compare with a linear regression model

In [140]:
from sklearn.linear_model import LinearRegression

In [141]:
lr_model = LinearRegression()

In [142]:
lr_model.fit(X_train, y_train)

LinearRegression()

In [143]:
y_pred = lr_model.predict(X_test)

In [144]:
from sklearn.metrics import mean_squared_error as MSE

In [146]:
mse_model = MSE(y_test, y_pred)
rmse_model = mse_model**(1/2)

In [147]:
print(rmse_model)

2.1009479757278187


# Question 5.	Create a neural network using pytorch to predict the same result as question 3.

In [299]:
import torch

In [300]:
X = abalone_df.drop('Rings', axis=1).values
y = abalone_df['Rings'].values

#Perform OneHotEncoding on only the 'Sex' Column to turn it into a numerical column instead of a categorical column. Drop the first column since it is repetitive data. 
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

ct = ColumnTransformer([("Sex", OneHotEncoder(drop='first'), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)

#Split the dataset into training and testing portions
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#standardize the dataset
from sklearn.preprocessing import StandardScaler

#Standardize the dataset
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

In [301]:
import torch.nn as nn
import torch.nn.functional as F 

#Convert numpy arrays to tensors
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)

y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

X_train = X_train.float()
y_train = y_train.float()
y_train = y_train.view(-1,1)
X_test = X_test.float()
y_test = y_test.float()
y_test = y_test.view(-1,1)

#print(X_train.shape)
print(y_train.shape)
#print(X_test.shape)
print(y_test.shape)

torch.Size([3292, 9])
torch.Size([3292])
torch.Size([823, 9])
torch.Size([823])
torch.Size([3292, 1])
torch.Size([823, 1])


In [302]:
class ANN_Model(nn.Module):
    def __init__(self, input_features=9, hidden1=200, hidden2=200, hidden3=200, out_features=1):
        super().__init__()
        self.layer_1_connection = nn.Linear(input_features, hidden1)
        self.layer_2_connection = nn.Linear(hidden1, hidden2)
        self.layer_3_connection = nn.Linear(hidden2, hidden3)
        self.out = nn.Linear(hidden3, out_features)
        
    def forward(self, x): 
        x = F.relu(self.layer_1_connection(x))
        x = F.relu(self.layer_2_connection(x))
        x = F.relu(self.layer_3_connection(x))
        x = self.out(x)
        return x

In [303]:
torch.manual_seed(42)

#instantiate the model
model = ANN_Model()

In [304]:
#define learning_rate
learning_rate = 0.001

#define loss function. Use MSE for regression
loss_function = nn.MSELoss()

#set optimizer
optimizer = torch.optim.SGD(model.parameters(), lr =learning_rate )

In [305]:
#run model through multiple epochs
final_loss = []
n_epochs = 200
for epoch in range(n_epochs):
    y_pred = model.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    if epoch % 10 == 1: 
        print(f'Epoch number: {epoch} with loss: {loss.item()}')
        
    optimizer.zero_grad()  #clears the gradient before running backwards propagation
    loss.backward() #for backward propagation
    optimizer.step() #performs one optimization step each epoch

Epoch number: 1 with loss: 102.95594787597656
Epoch number: 11 with loss: 92.97309875488281
Epoch number: 21 with loss: 76.56812286376953
Epoch number: 31 with loss: 40.18474197387695
Epoch number: 41 with loss: 16.46268653869629
Epoch number: 51 with loss: 13.993751525878906
Epoch number: 61 with loss: 12.295690536499023
Epoch number: 71 with loss: 10.8180570602417
Epoch number: 81 with loss: 9.534835815429688
Epoch number: 91 with loss: 8.439022064208984
Epoch number: 101 with loss: 7.5276031494140625
Epoch number: 111 with loss: 6.790976047515869
Epoch number: 121 with loss: 6.212918281555176
Epoch number: 131 with loss: 5.769845008850098
Epoch number: 141 with loss: 5.435546875
Epoch number: 151 with loss: 5.185831546783447
Epoch number: 161 with loss: 4.999022960662842
Epoch number: 171 with loss: 4.85835075378418
Epoch number: 181 with loss: 4.75035285949707
Epoch number: 191 with loss: 4.665534496307373


In [306]:
rmse = np.sqrt(4.6655)
print(rmse)

2.1599768517278144


# Question 6.	Compare the performance of the neural networks to the other model you created. Which performed better? Why do you think that is?

The neural network model that I created using keras performed the best for me out of all the models that I tried.  I ended up with a rmse of 1.90 using this model as opposed to the decision trees that I tried last week where I could only get a rmse of around 2.16.  Neural networks are able to tune themselves and update parameters to find the optimal relationships between variables, so they often perform better than other types of models, though not always. Keras gave me a lower mse than pytorch, though I used the same optimizer, learning rate, number of hidden layers and number of nodes per hidden layer, so I'm not sure what caused the variation between the two models.      