# 2-Layer Neural Network

In a separate notebook, we tested our data on a single-layer neural network. Here we will try to fit it on a 2 layer network to hopefully achieve better results.

In [1]:
import pandas as pd
from datetime import datetime, timedelta

Read in csv

In [2]:
taxi = pd.read_csv("taxi_clean_lg.csv")

Change pick up/ dropoff times from strings to datetimes

In [3]:
import time
taxi['pickup_datetime'] = pd.to_datetime(taxi["pickup_datetime"])
taxi['dropoff_datetime'] = pd.to_datetime(taxi["dropoff_datetime"])

Change ride_duration from strings to minutes

In [4]:
taxi['ride_duration (min)'] = taxi['ride_duration'].apply(lambda time : 
                                                    datetime.strptime(time.split()[2].split('.')[0],"%H:%M:%S").minute 
                                                    + datetime.strptime(time.split()[2].split('.')[0],"%H:%M:%S").hour*60)

Divide into features and labels

In [5]:
taxi.columns

Index(['trip_distance', 'fare_amount', 'winter', 'spring', 'summer', 'fall',
       'PULongitude', 'PULatitude', 'DOLongitude', 'DOLatitude',
       'pickup_datetime', 'dropoff_datetime', 'ride_duration', 'Early morning',
       'Morning', 'Afternoon', 'Night', 'Holiday Proximity', 'label',
       'ride_duration (min)'],
      dtype='object')

In [6]:
X = taxi[['trip_distance', 'winter', 'spring', 'summer', 'fall','PULongitude', 'PULatitude', 'DOLongitude', 
          'DOLatitude', 'ride_duration (min)', 'Early morning','Morning', 'Afternoon', 'Night', 'Holiday Proximity']]

In [11]:
y = taxi['label']

Begin the cross validation process to determine optimum activation function and width of both hidden layers

In [13]:
from sklearn.neural_network import MLPClassifier

In [65]:
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler

layers = []
sizes = list(range(50,201,50))
for width1 in sizes:
  for width2 in sizes:
    layers.append((width1, width2))
layers

[(50, 50),
 (50, 100),
 (50, 150),
 (50, 200),
 (100, 50),
 (100, 100),
 (100, 150),
 (100, 200),
 (150, 50),
 (150, 100),
 (150, 150),
 (150, 200),
 (200, 50),
 (200, 100),
 (200, 150),
 (200, 200)]

In [66]:
scaler = StandardScaler()

mlp = MLPClassifier(early_stopping = True, learning_rate = 'adaptive')
pipe = Pipeline([('scaler',scaler),('MLPClassifier', mlp)])

grid = {"MLPClassifier__hidden_layer_sizes":layers, "MLPClassifier__activation":['logistic', 'tanh', 'relu', 'identity']}

grid_search = GridSearchCV(pipe, grid, cv = 3, verbose = 2)
grid_search.fit(X,y)

Fitting 3 folds for each of 64 candidates, totalling 192 fits
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(50, 50) 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(50, 50), total=   6.9s
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(50, 50) 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    6.8s remaining:    0.0s


[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(50, 50), total=   4.9s
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(50, 50) 
[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(50, 50), total=   5.8s
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(50, 100) 
[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(50, 100), total=   7.3s
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(50, 100) 
[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(50, 100), total=   9.3s
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(50, 100) 
[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(50, 100), total=   9.3s
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(50, 150) 
[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_laye

[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(200, 200), total=  23.8s
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(200, 200) 
[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(200, 200), total=  25.6s
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(200, 200) 
[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(200, 200), total=  20.5s
[CV] MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(50, 50) 
[CV]  MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(50, 50), total=  12.8s
[CV] MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(50, 50) 
[CV]  MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(50, 50), total=  12.6s
[CV] MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(50, 50) 
[CV]  MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(50, 50), tot

[CV]  MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(200, 150), total=  15.9s
[CV] MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(200, 150) 
[CV]  MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(200, 150), total=  21.3s
[CV] MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(200, 200) 
[CV]  MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(200, 200), total=  31.3s
[CV] MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(200, 200) 
[CV]  MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(200, 200), total=  22.2s
[CV] MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(200, 200) 
[CV]  MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(200, 200), total=  23.9s
[CV] MLPClassifier__activation=relu, MLPClassifier__hidden_layer_sizes=(50, 50) 
[CV]  MLPClassifier__activation=relu, MLPClassifier__hidden_layer_sizes=(50, 50), total=  14.5s
[

[CV]  MLPClassifier__activation=relu, MLPClassifier__hidden_layer_sizes=(200, 100), total=  30.5s
[CV] MLPClassifier__activation=relu, MLPClassifier__hidden_layer_sizes=(200, 150) 
[CV]  MLPClassifier__activation=relu, MLPClassifier__hidden_layer_sizes=(200, 150), total=  34.5s
[CV] MLPClassifier__activation=relu, MLPClassifier__hidden_layer_sizes=(200, 150) 
[CV]  MLPClassifier__activation=relu, MLPClassifier__hidden_layer_sizes=(200, 150), total=  30.6s
[CV] MLPClassifier__activation=relu, MLPClassifier__hidden_layer_sizes=(200, 150) 
[CV]  MLPClassifier__activation=relu, MLPClassifier__hidden_layer_sizes=(200, 150), total=  38.0s
[CV] MLPClassifier__activation=relu, MLPClassifier__hidden_layer_sizes=(200, 200) 
[CV]  MLPClassifier__activation=relu, MLPClassifier__hidden_layer_sizes=(200, 200), total=  39.0s
[CV] MLPClassifier__activation=relu, MLPClassifier__hidden_layer_sizes=(200, 200) 
[CV]  MLPClassifier__activation=relu, MLPClassifier__hidden_layer_sizes=(200, 200), total=  35.

[CV]  MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(200, 50), total=   5.4s
[CV] MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(200, 50) 
[CV]  MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(200, 50), total=   8.1s
[CV] MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(200, 100) 
[CV]  MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(200, 100), total=   5.8s
[CV] MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(200, 100) 
[CV]  MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(200, 100), total=   5.6s
[CV] MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(200, 100) 
[CV]  MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(200, 100), total=   8.2s
[CV] MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(200, 150) 
[CV]  MLPClassifier__activation=identity, MLPClassifier__h

[Parallel(n_jobs=1)]: Done 192 out of 192 | elapsed: 78.5min finished


GridSearchCV(cv=3, error_score='raise-deprecating',
             estimator=Pipeline(memory=None,
                                steps=[('scaler',
                                        StandardScaler(copy=True,
                                                       with_mean=True,
                                                       with_std=True)),
                                       ('MLPClassifier',
                                        MLPClassifier(activation='relu',
                                                      alpha=0.0001,
                                                      batch_size='auto',
                                                      beta_1=0.9, beta_2=0.999,
                                                      early_stopping=True,
                                                      epsilon=1e-08,
                                                      hidden_layer_sizes=(100,),
                                                      learning_rate=

In [67]:
print(grid_search.best_score_)
print(grid_search.best_params_)

0.910634650501236
{'MLPClassifier__activation': 'tanh', 'MLPClassifier__hidden_layer_sizes': (100, 100)}


The hyperbolic tangent activation function with hidden layer sizes of 100 and 100 produced the best CV accuracy. Zero in on the interval around (100,100)

In [68]:
layers = []
sizes = list(range(70,131,30))
for width1 in sizes:
  for width2 in sizes:
    layers.append((width1, width2))
layers

[(70, 70),
 (70, 100),
 (70, 130),
 (100, 70),
 (100, 100),
 (100, 130),
 (130, 70),
 (130, 100),
 (130, 130)]

In [69]:
scaler = StandardScaler()

mlp = MLPClassifier(early_stopping = True, learning_rate = 'adaptive')
pipe = Pipeline([('scaler',scaler),('MLPClassifier', mlp)])

grid = {"MLPClassifier__hidden_layer_sizes":layers, "MLPClassifier__activation":['logistic', 'tanh', 'relu', 'identity']}

grid_search = GridSearchCV(pipe, grid, cv = 3, verbose = 2)
grid_search.fit(X,y)

Fitting 3 folds for each of 36 candidates, totalling 108 fits
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(70, 70) 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(70, 70), total=  11.5s
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(70, 70) 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   11.4s remaining:    0.0s


[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(70, 70), total=  23.8s
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(70, 70) 
[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(70, 70), total=   8.8s
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(70, 100) 
[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(70, 100), total=   6.5s
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(70, 100) 
[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(70, 100), total=   8.6s
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(70, 100) 
[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(70, 100), total=   7.8s
[CV] MLPClassifier__activation=logistic, MLPClassifier__hidden_layer_sizes=(70, 130) 
[CV]  MLPClassifier__activation=logistic, MLPClassifier__hidden_laye

[CV]  MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(130, 70), total=  14.0s
[CV] MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(130, 70) 
[CV]  MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(130, 70), total=  14.0s
[CV] MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(130, 100) 
[CV]  MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(130, 100), total=  23.6s
[CV] MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(130, 100) 
[CV]  MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(130, 100), total=  21.9s
[CV] MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(130, 100) 
[CV]  MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(130, 100), total=  29.3s
[CV] MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(130, 130) 
[CV]  MLPClassifier__activation=tanh, MLPClassifier__hidden_layer_sizes=(130, 130), total=  20.0s


[CV]  MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(100, 70), total=   4.6s
[CV] MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(100, 100) 
[CV]  MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(100, 100), total=   5.8s
[CV] MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(100, 100) 
[CV]  MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(100, 100), total=   5.2s
[CV] MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(100, 100) 
[CV]  MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(100, 100), total=  10.4s
[CV] MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(100, 130) 
[CV]  MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(100, 130), total=   7.8s
[CV] MLPClassifier__activation=identity, MLPClassifier__hidden_layer_sizes=(100, 130) 
[CV]  MLPClassifier__activation=identity, MLPClassifier_

[Parallel(n_jobs=1)]: Done 108 out of 108 | elapsed: 26.6min finished


GridSearchCV(cv=3, error_score='raise-deprecating',
             estimator=Pipeline(memory=None,
                                steps=[('scaler',
                                        StandardScaler(copy=True,
                                                       with_mean=True,
                                                       with_std=True)),
                                       ('MLPClassifier',
                                        MLPClassifier(activation='relu',
                                                      alpha=0.0001,
                                                      batch_size='auto',
                                                      beta_1=0.9, beta_2=0.999,
                                                      early_stopping=True,
                                                      epsilon=1e-08,
                                                      hidden_layer_sizes=(100,),
                                                      learning_rate=

In [70]:
print(grid_search.best_score_)
print(grid_search.best_params_)

0.9102247177053702
{'MLPClassifier__activation': 'tanh', 'MLPClassifier__hidden_layer_sizes': (130, 130)}


Cross validation accuracy did not improve a substantial ammount, so end the CV loops to obtain a final CV accuracy

Scale the data

In [14]:
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)
X_sc = scaler.transform(X)

Find 5-fold CV error

In [15]:
cv = cross_val_score(MLPClassifier(activation = 'tanh', hidden_layer_sizes = (130,130), max_iter = 1000), X_sc, y, cv = 5, verbose = True)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 33.2min finished


In [16]:
"5-fold CV score: " + str(round(cv.mean()*100,2)) + '% accuracy'

'5-fold CV score: 88.37% accuracy'

Make cross validation predictions to obtain a classification report and a confusion matrix

In [17]:
from sklearn.model_selection import cross_val_predict
cv_pred = cross_val_predict(MLPClassifier(activation = 'tanh', hidden_layer_sizes = (130,130), max_iter = 1000), X_sc, y, cv = 5)

In [18]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

In [19]:
print(confusion_matrix(cv_pred, y))

[[ 5155   491     0     1     0     0     0     0     0     0]
 [  712 17778   867     2     1     0     0     0     0     1]
 [    3   866 14955   774     5     1     1     0     0     1]
 [    0     3   832 10002   642     2     0     1     1     2]
 [    1     2     6   633  6132   467     7    10     2     4]
 [    0     0     5     1   513  3790   388    15     0    11]
 [    0     0     1     1     4   387  2450   295     3    12]
 [    0     0     0     3     2    12   329  1649   254    22]
 [    0     0     0     0     0     4     5   247  1113   244]
 [    0     2     2     2     3    14    10    17   241  8087]]


In [20]:
print(classification_report(cv_pred, y))

              precision    recall  f1-score   support

           A       0.88      0.91      0.90      5647
           B       0.93      0.92      0.92     19361
           C       0.90      0.90      0.90     16606
           D       0.88      0.87      0.87     11485
           E       0.84      0.84      0.84      7264
           F       0.81      0.80      0.81      4723
           G       0.77      0.78      0.77      3153
           H       0.74      0.73      0.73      2271
           I       0.69      0.69      0.69      1613
           J       0.96      0.97      0.96      8378

    accuracy                           0.88     80501
   macro avg       0.84      0.84      0.84     80501
weighted avg       0.88      0.88      0.88     80501



The 5-fold CV accuracy of a neural net with a hyperbolic tangent activation function and hidden layer widths of 100 and 100, repectively, was 88.37%. The model did the best in terms of f1 score in identifying members of class J, which contained rides that had fares greater than \\$25. The model had the worst f1 score in identifying members of class I, which contained rides with fares that were less than \\$25, but greater than or equal to \\$22.50. Most often, true members of class I were mislabeled as a member of class J, or of class H, with almost equal frequency.