# Week 5: Neural Network

### What's on this week
1. [Resuming from week 4](#resume)
2. [Building your first neural network model](#build)
3. [Understanding your neural network model](#viz)
4. [Finding optimal hyperparameters with GridSearchCV](#gridsearch)
5. [Feature selection](#fselect)
6. [Comparing each model](#comparison)

---

The practical note for this week introduces you to neural network mining in Python, particularly using multilayer perceptron classifier. Neural networks are a class of predictive models that mimic the structure of human brain. It consists of layers of neurons, each consuming outputs from the previous layers as inputs. Neural network is the most complex model out of everything that we have used so far.

**This tutorial notes is in experimental version. Please give us feedbacks and suggestions on how to make it better. Ask your tutor for any question and clarification.**

## 1. Resuming from week 4<a name="resume"></a>
Last week, we learned how to perform data mining with decision trees in Python. For this week, we will again reuse the code for data preprocessing. Just as regression models, neural networks are sensitive to skewed data, thus we also perform standarization on it:

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
from sklearn.model_selection import GridSearchCV
from dm_tools import data_prep
from sklearn.preprocessing import StandardScaler

# preprocessing step
df = data_prep()

# train test split
y = df['TargetB']
X = df.drop(['TargetB'], axis=1)
X_mat = X.as_matrix()
X_train, X_test, y_train, y_test = train_test_split(X_mat, y, test_size=0.5, random_state=42, stratify=y)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train, y_train)
X_test = scaler.transform(X_test)

## 2. Building your first neural network model

Start by importing your neural network from the library.

In [2]:
from sklearn.neural_network import MLPClassifier

model = MLPClassifier(max_iter=1000)
model.fit(X_train, y_train)

print(model.score(X_train, y_train))
print(model.score(X_test, y_test))

print(model)

0.924839975222
0.526533140615
MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=1000, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)


In [3]:
print(X_train.shape[1])

85


In [4]:
step = int((X_train.shape[1] + 5)/5);
params = {'hidden_layer_sizes': [7, 9, 11, 13], 'alpha': [0.01,0.001, 0.0001, 0.00001]}

cv = GridSearchCV(param_grid=params, estimator=MLPClassifier(max_iter=1000), cv=10, n_jobs=-1)
cv.fit(X_train, y_train)

print(cv.score(X_train, y_train))
print(cv.score(X_test, y_test))

print(cv.best_params_)

0.660747470576
0.550072269255
{'alpha': 0.01, 'hidden_layer_sizes': 7}


In [5]:
from sklearn.feature_selection import RFECV
from sklearn.linear_model import LogisticRegression

rfe = RFECV(estimator = LogisticRegression(), cv=10)
rfe.fit(X_train, y_train)

X_train_rfe = rfe.transform(X_train)
X_test_rfe = rfe.transform(X_test)

step = int((X_train_rfe.shape[1] + 5)/5);
params = {'hidden_layer_sizes': [7, 9, 11, 13], 'alpha': [0.01,0.001, 0.0001, 0.00001]}

cv = GridSearchCV(param_grid=params, estimator=MLPClassifier(max_iter=1000), cv=10, n_jobs=-1)
cv.fit(X_train_rfe, y_train)

print(cv.score(X_train_rfe, y_train))
print(cv.score(X_test_rfe, y_test))

print(cv.best_params_)

0.607474705761
0.569688209787
{'alpha': 1e-05, 'hidden_layer_sizes': 7}


In [6]:
from sklearn.decomposition import PCA

pca = PCA()
pca.fit(X_train)

sum_var = 0
for idx, val in enumerate(pca.explained_variance_ratio_):
    sum_var += val
    if (sum_var >= 0.95):
        print("N components with > 95% variance =", idx+1)
        break

N components with > 95% variance = 66


In [7]:
pca = PCA(n_components=66)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

step = int((X_train_pca.shape[1] + 5)/5);
params = {'hidden_layer_sizes': [7, 9, 11, 13], 'alpha': [0.01,0.001, 0.0001, 0.00001]}

cv = GridSearchCV(param_grid=params, estimator=MLPClassifier(max_iter=1000), cv=10, n_jobs=-1)
cv.fit(X_train_pca, y_train)

print(cv.score(X_train_pca, y_train))
print(cv.score(X_test_pca, y_test))

print(cv.best_params_)
# print parameters of the best model
# print(cv.best_params_)

0.637001858352
0.537270287012
{'alpha': 1e-05, 'hidden_layer_sizes': 7}


In [8]:
from sklearn.tree import DecisionTreeClassifier

params = {'criterion': ['gini', 'entropy'],
          'max_depth': range(3, 10),
          'min_samples_leaf': range(20, 200, 20)}

cv = GridSearchCV(param_grid=params, estimator=DecisionTreeClassifier(), cv=10)
cv.fit(X_train, y_train)

GridSearchCV(cv=10, error_score='raise',
       estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best'),
       fit_params={}, iid=True, n_jobs=1,
       param_grid={'criterion': ['gini', 'entropy'], 'min_samples_leaf': range(20, 200, 20), 'max_depth': range(3, 10)},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=0)

In [9]:
from dm_tools import analyse_feature_importance

analyse_feature_importance(cv.best_estimator_, X.columns)

GiftAvgLast : 0.424147866229
DemMedHomeValue : 0.17874460397
GiftTimeLast : 0.135715271083
GiftAvgCard36 : 0.0996893965541
DemAge : 0.0654544865005
PromCntCard36 : 0.0494236862349
GiftCntAll : 0.0468246894295
DemGender_U : 0.0
DemCluster_11 : 0.0
StatusCat96NK_N : 0.0
StatusCat96NK_S : 0.0
DemCluster_0 : 0.0
DemCluster_1 : 0.0
DemCluster_10 : 0.0
DemCluster_13 : 0.0
DemCluster_12 : 0.0
StatusCat96NK_F : 0.0
DemCluster_14 : 0.0
DemCluster_15 : 0.0
DemCluster_16 : 0.0


In [10]:
from sklearn.feature_selection import SelectFromModel

selectmodel = SelectFromModel(cv.best_estimator_, prefit=True)
X_train_sel_model = selectmodel.transform(X_train)
X_test_sel_model = selectmodel.transform(X_test)

print(X_train_sel_model.shape)

(4843, 7)


In [11]:
step = int((X_train_sel_model.shape[1] + 5)/5);
params = {'hidden_layer_sizes': [7, 9, 11, 13], 'alpha': [0.01,0.001, 0.0001, 0.00001]}

cv = GridSearchCV(param_grid=params, estimator=MLPClassifier(max_iter=1000), cv=10, n_jobs=-1)
cv.fit(X_train_sel_model, y_train)

print(cv.score(X_train_sel_model, y_train))
print(cv.score(X_test_sel_model, y_test))

print(cv.best_params_)

0.580838323353
0.569068759034
{'alpha': 1e-05, 'hidden_layer_sizes': 13}
