We'll be building our first neural net, which will have multiple features that are fed in and processed by a set of perceptron models to generate a response trained to our output variable.

Similar to many other models in the course so far, this can be used as both a regression or classification model.

We'll be using a public [dataset](https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artworks.csv) from the MoMA.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
artworks = pd.read_csv('https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artworks.csv')

Data cleaning and selection of the columns we're interested in...

In [3]:
# Select Columns.
artworks = artworks[['Artist', 'Nationality', 'Gender', 'Date', 'Department',
                    'DateAcquired', 'URL', 'ThumbnailURL', 'Height (cm)', 'Width (cm)']]

# Convert URL's to booleans.
artworks['URL'] = artworks['URL'].notnull()
artworks['ThumbnailURL'] = artworks['ThumbnailURL'].notnull()

# Drop films and some other tricky rows.
artworks = artworks[artworks['Department']!='Film']
artworks = artworks[artworks['Department']!='Media and Performance Art']
artworks = artworks[artworks['Department']!='Fluxus Collection']

# Drop missing data.
artworks = artworks.dropna()

In [4]:
artworks.head()

Unnamed: 0,Artist,Nationality,Gender,Date,Department,DateAcquired,URL,ThumbnailURL,Height (cm),Width (cm)
0,Otto Wagner,(Austrian),(Male),1896,Architecture & Design,1996-04-09,True,True,48.6,168.9
1,Christian de Portzamparc,(French),(Male),1987,Architecture & Design,1995-01-17,True,True,40.6401,29.8451
2,Emil Hoppe,(Austrian),(Male),1903,Architecture & Design,1997-01-15,True,True,34.3,31.8
3,Bernard Tschumi,(),(Male),1980,Architecture & Design,1995-01-17,True,True,50.8,50.8
4,Emil Hoppe,(Austrian),(Male),1903,Architecture & Design,1997-01-15,True,True,38.4,19.1


Data needs to be in the correct format, so we'll do some more cleaning...

In [5]:
artworks['DateAcquired'] = pd.to_datetime(artworks.DateAcquired)
artworks['YearAcquired'] = artworks.DateAcquired.dt.year
artworks['YearAcquired'].dtype

dtype('int64')

In [6]:
# Remove multiple nationalities, genders, and artists.
artworks.loc[artworks['Gender'].str.contains('\) \('), 'Gender'] = '\(multiple_persons\)'
artworks.loc[artworks['Nationality'].str.contains('\) \('), 'Nationality'] = '\(multiple_nationalities\)'
artworks.loc[artworks['Artist'].str.contains(','), 'Artist'] = 'Multiple_Artists'

# Convert dates to start date, cutting down number of distinct examples.
# Note the nifty regular expression (RegEx) pattern that pulls out just the first four digits
# Not sure why we are getting rid of the bottom observation in the below...
artworks['Date'] = pd.Series(artworks.Date.str.extract(
    '([0-9]{4})', expand=False))[:-1]

# Final column drops and NA drop.
X = artworks.drop(['Department', 'DateAcquired', 'Artist', 'Nationality', 'Date'], 1)

# Create dummies separately.
artists = pd.get_dummies(artworks.Artist)
nationalities = pd.get_dummies(artworks.Nationality)
dates = pd.get_dummies(artworks.Date)

# Concat with other variables, but artists slows this wayyyyy down so we'll keep it out for now
X = pd.get_dummies(X, sparse=True)
X = pd.concat([X, nationalities, dates], axis=1)

Y = artworks.Department

In [7]:
len(pd.Series(artworks.Date.str.extract(
    '([0-9]{4})', expand=False))[:-1])

104672

In [8]:
# Alright! We've done our prep, let's build the model.
# Neural networks are hugely computationally intensive.
# This may take several minutes to run.

# Import the model.
from sklearn.neural_network import MLPClassifier

# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(1000,))
mlp.fit(X, Y)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(1000,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

And performing some cross validation...

In [10]:
mlp.score(X,Y)

0.6908467322041023

In [9]:
from sklearn.model_selection import cross_val_score
cross_val_score(mlp, X, Y, cv=5)

array([0.59593065, 0.64711502, 0.33532362, 0.45674294, 0.52168928])

What did we learn? 
- For one, the model is somewhat accurate on the training data but seems to be overfit according to our CV. This can happen when neural nets don't have a lot of data for the number of features present. 
- In general, neural networks are known to like *a lot* of data.
- They can also take a very long time to run.
- It was wise to leave the artist dummies out because that would've only added to the number of features relative to the number of observations and worsened the problems above (overfitting and slow running)

### Model parameters
Let's quickly go over some of the parameters.
- __hidden_layer_sizes__: this specifies both the number of layers and their size, and takes a tuple
    - We put in (1000,) for a single-layered neural net with 1000 perceptrons. To make one with two layers of 300 and 200 perceptrons, respectively, you would pass (300,200,)
    - How many you choose depends on cross validation performance and computational constraints (which can get severe pretty quickly)
- You can also set __alpha__, which is a regularization constraint to penalize large coefficients, in much the same vein as for the regressors we previously studied. Alpha scales the size of that penalty (the higher alpha is, the more penalized large coefficients are)
- __Activation function__, which is the function that is used to output the results. The *sigmoid* one we discussed earlier is referred to as *logistic* in sklearn, and the default for the MLP model is 'relu' or 'rectified linear unit function'
    - [Link](https://en.wikipedia.org/wiki/Activation_function) to different activation functions
    - [Link](https://en.wikipedia.org/wiki/Multilayer_perceptron) more information on MLPs (multilayer perceptrons)
    - [Link](http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html) MLP classifier in sklearn documentation
    
Cross validation is usually the best course of action for determining parameters.

#### Drill: Now it's our job to play with layers to see how they affect results. Using the space below, experiment with different hidden layer structures. You can try this on a subset of the data to improve runtime. See how things vary. See what seems to matter the most. Feel free to manipulate other parameters as well. It may also be beneficial to do some real feature selection work...

In [15]:
# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(10,10,10,10,))
mlp.fit(X, Y)
mlp.score(X,Y)

0.665118989615278

Interestingly, a model with much fewer overall perceptrons but more layers does better.

In [16]:
# Switching to a subset going forward just to add more performance flexibility
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, Y, test_size=0.5, random_state=42)

In [18]:
# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(10,10,10,10,))
mlp.fit(X_train, y_train)
mlp.score(X_train, y_train)

0.6441646285539591

Since we know that perceptrons like a lot of data, it makes sense that accuracy goes down a little when we have less data to use.

In [19]:
# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(10,10,10,10,10,10,10,))
mlp.fit(X_train, y_train)
mlp.score(X_train, y_train)

0.6642655151329868

Performance continues to increase as the number of layers increases.

In [20]:
# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(100,100,100,100,))
mlp.fit(X_train, y_train)
mlp.score(X_train, y_train)

0.7015629776826658

In [27]:
# Messing with different alphas
# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(100,100,100,100,),alpha=0)
mlp.fit(X_train, y_train)
mlp.score(X_train, y_train)

0.7820811678385815

In [29]:
cross_val_score(mlp, X_train, y_train, cv=5)

array([0.63075454, 0.6025984 , 0.6532913 , 0.64055035, 0.64921166])

Removing the regularization actually dramatically improves our results on the training set, but leads to some overfitting, just as it does in the case of many regressions. The default actually works best from the ones I tried, with an increase in the regularization also declining performance.

In [24]:
print(artworks.columns)

Index(['Artist', 'Nationality', 'Gender', 'Date', 'Department', 'DateAcquired',
       'URL', 'ThumbnailURL', 'Height (cm)', 'Width (cm)', 'YearAcquired'],
      dtype='object')


There's probably some feature selection work to be done, but I'm going to move on to the next lesson since it takes my computer ten minutes to do every single run of the model fit!