# Drill: Playing with layers of a Neural Network -- Kristofer Schobert

Here I run Thinkful's code for creating a neural network which classifies a museums's artwork by the department the art is part of. 

The catagories are 'Architecture & Design', 'Drawings & Prints', 'Media and Performance', 'Painting & Sculpture', and 'Photography'. 

I then try out a few different hidden layer structures and use cross validation to determine which is best. This is a very short drill. After trying four different structures, I found that two layers with 20 neurons each is best. 

In [64]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [65]:
artworks = pd.read_csv('https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artworks.csv')

In [66]:
artworks.columns

Index(['Title', 'Artist', 'ConstituentID', 'ArtistBio', 'Nationality',
       'BeginDate', 'EndDate', 'Gender', 'Date', 'Medium', 'Dimensions',
       'CreditLine', 'AccessionNumber', 'Classification', 'Department',
       'DateAcquired', 'Cataloged', 'ObjectID', 'URL', 'ThumbnailURL',
       'Circumference (cm)', 'Depth (cm)', 'Diameter (cm)', 'Height (cm)',
       'Length (cm)', 'Weight (kg)', 'Width (cm)', 'Seat Height (cm)',
       'Duration (sec.)'],
      dtype='object')

In [67]:
# Select Columns.
artworks = artworks[['Artist', 'Nationality', 'Gender', 'Date', 'Department',
                    'DateAcquired', 'URL', 'ThumbnailURL', 'Height (cm)', 'Width (cm)']]

# Convert URL's to booleans.
artworks['URL'] = artworks['URL'].notnull()
artworks['ThumbnailURL'] = artworks['ThumbnailURL'].notnull()

# Drop films and some other tricky rows.
artworks = artworks[artworks['Department']!='Film']
artworks = artworks[artworks['Department']!='Media and Performance Art']
artworks = artworks[artworks['Department']!='Fluxus Collection']

# Drop missing data.
artworks = artworks.dropna()

In [68]:
artworks.head()

Unnamed: 0,Artist,Nationality,Gender,Date,Department,DateAcquired,URL,ThumbnailURL,Height (cm),Width (cm)
0,Otto Wagner,(Austrian),(Male),1896,Architecture & Design,1996-04-09,True,True,48.6,168.9
1,Christian de Portzamparc,(French),(Male),1987,Architecture & Design,1995-01-17,True,True,40.6401,29.8451
2,Emil Hoppe,(Austrian),(Male),1903,Architecture & Design,1997-01-15,True,True,34.3,31.8
3,Bernard Tschumi,(),(Male),1980,Architecture & Design,1995-01-17,True,True,50.8,50.8
4,Emil Hoppe,(Austrian),(Male),1903,Architecture & Design,1997-01-15,True,True,38.4,19.1


In [69]:
# Get data types.
artworks.dtypes

Artist           object
Nationality      object
Gender           object
Date             object
Department       object
DateAcquired     object
URL                bool
ThumbnailURL       bool
Height (cm)     float64
Width (cm)      float64
dtype: object

In [70]:
artworks['DateAcquired'] = pd.to_datetime(artworks.DateAcquired)
artworks['YearAcquired'] = artworks.DateAcquired.dt.year
artworks['YearAcquired'].dtype

dtype('int64')

In [71]:
artworks.Gender[65]

'(Male) (Male)'

In [72]:
# Remove multiple nationalities, genders, and artists.
artworks.loc[artworks['Gender'].str.contains('\) \('), 'Gender'] = '\(multiple_persons\)'
artworks.loc[artworks['Nationality'].str.contains('\) \('), 'Nationality'] = '\(multiple_nationalities\)'
artworks.loc[artworks['Artist'].str.contains(','), 'Artist'] = 'Multiple_Artists'

# Convert dates to start date, cutting down number of distinct examples.
artworks['Date'] = pd.Series(artworks.Date.str.extract(
    '([0-9]{4})', expand=False))[:-1]

In [73]:
set(artworks.Gender)

{'()', '(Female)', '(Male)', '(male)', '\\(multiple_persons\\)'}

In [74]:

# Final column drops and NA drop.
X = artworks.drop(['Department', 'DateAcquired', 'Artist', 'Nationality', 'Date'], 1)

# Create dummies separately.
artists = pd.get_dummies(artworks.Artist)
nationalities = pd.get_dummies(artworks.Nationality)
dates = pd.get_dummies(artworks.Date)

# Concat with other variables, but artists slows this wayyyyy down so we'll keep it out for now
X = pd.get_dummies(X, sparse=True)
X = pd.concat([X, nationalities, dates], axis=1)

Y = artworks.Department

In [99]:
set(Y)

{'Architecture & Design',
 'Drawings & Prints',
 'Media and Performance',
 'Painting & Sculpture',
 'Photography'}

## Using one layer of 1000 neurons

In [10]:
# Alright! We've done our prep, let's build the model.
# Neural networks are hugely computationally intensive.
# This may take several minutes to run.

# Import the model.
from sklearn.neural_network import MLPClassifier

# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(1000,))
mlp.fit(X, Y)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(1000,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [11]:
mlp.score(X, Y)

0.5225808843291536

In [12]:
Y.value_counts()/len(Y)

Drawings & Prints        0.623627
Photography              0.226170
Architecture & Design    0.112928
Painting & Sculpture     0.033660
Media and Performance    0.003614
Name: Department, dtype: float64

In [13]:
from sklearn.model_selection import cross_val_score
cross_val_score(mlp, X, Y, cv=5)

array([0.67420416, 0.69649228, 0.67099166, 0.50991659, 0.56170351])

In [87]:
pd.Series(['a', 'a', '4']).value_counts()/len(pd.Series(['a', 'a', '4']))

a    0.666667
4    0.333333
dtype: float64

{'Architecture & Design',
 'Drawings & Prints',
 'Media and Performance',
 'Painting & Sculpture',
 'Photography'}

In [93]:
np.mean([0.67420416, 0.69649228, 0.67099166, 0.50991659, 0.56170351])


0.62266164

We have a mean cross validation scoer of 62%. Let's see if we can't improve that score by trying differents hidden layers. 

## using two layers of 20 neurons each

In [90]:
# Import the model.
from sklearn.neural_network import MLPClassifier

# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(20,20,))
mlp.fit(X, Y)

mlp.score(X, Y)


0.7503174205985116

In [91]:
from sklearn.model_selection import cross_val_score
cross_val_score(mlp, X, Y, cv=5)

array([0.69459247, 0.70608406, 0.68507878, 0.66241891, 0.62417165])

In [94]:
np.mean([0.69459247, 0.70608406, 0.68507878, 0.66241891, 0.62417165])

0.6744691740000001

Our mean cross validation score is 67%. Okay, we have an improvement. It seems using two smaller layers is better than one large layer. 

## Using two layers. The first with 20 the second with 10.

In [97]:
# Import the model.
from sklearn.neural_network import MLPClassifier

# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(20,10,))
mlp.fit(X, Y)

mlp.score(X, Y)

cvs = cross_val_score(mlp, X, Y, cv=5)
print(cvs)
print('Our mean cross_val_score is {}'.format(np.mean(cvs)))

[0.69255364 0.69426811 0.67437442 0.63225209 0.61184485]
Our mean cross_val_score is 0.661058619305911


Using a smaller second layer has lead to a slightly lower cross validation score. I wonder if there are times when having less perceptrons in a layer is helpful?

## Using three layers each with 16 neurons

In [98]:
# Import the model.
from sklearn.neural_network import MLPClassifier

# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(16,16,16,))
mlp.fit(X, Y)

mlp.score(X, Y)

cvs = cross_val_score(mlp, X, Y, cv=5)
print(cvs)
print('Our mean cross_val_score is {}'.format(np.mean(cvs)))

[0.69816042 0.75181873 0.64240037 0.66696015 0.55725474]
Our mean cross_val_score is 0.6633188811454132


This is similarily as good as the last model. One could play around with the number of features here for ever it seems. I it intersting to note that 1000 perceptrons in one hidden layer is noticably worse than two layers of 20. There is a ton of room for experimentation in these models. 

The best neural network we have found is uses two hidden layers with 20 neurons each. 

## Our winner: Two layers with 20 neurons each