# Neural Networks
 
Neural Networks are a machine learning framework that attempts to mimic the learning pattern of natural biological neural networks. Biological neural networks have interconnected neurons with dendrites that receive inputs, then based on these inputs they produce an output signal through an axon to another neuron. We will try to mimic this process through the use of Artificial Neural Networks (ANN), which we will just refer to as neural networks from now on. The process of creating a neural network begins with the most basic form, a single perceptron.

The Perceptron
 
Let's start our discussion by talking about the Perceptron! A perceptron has one or more inputs, a bias, an activation function, and a single output. The perceptron receives inputs, multiplies them by some weight, and then passes them into an activation function to produce an output. There are many possible activation functions to choose from, such as the logistic function, a trigonometric function, a step function etc. We also make sure to add a bias to the perceptron, this avoids issues where all inputs could be equal to zero (meaning no multiplicative weight would have an effect). Check out the diagram below for a visualization of a perceptron:

Once we have the output we can compare it to a known label and adjust the weights accordingly (the weights usually start off with random initialization values). We keep repeating this process until we have reached a maximum number of allowed iterations, or an acceptable error rate.

To create a neural network, we simply begin to add layers of perceptrons together, creating a multi-layer perceptron model of a neural network. You'll have an input layer which directly takes in your feature inputs and an output layer which will create the resulting outputs. Any layers in between are known as hidden layers because they don't directly "see" the feature inputs or outputs. For a visualization of this check out the diagram below (source: Wikipedia).

Let's move on to actually creating a neural network with Python!

SciKit-Learn
 
In order to follow along with this tutorial, you'll need to have the latest version of SciKit Learn installed! It is easily installable either through pip or conda, but you can reference the official installation documentation for complete details on this.

Data
 
We'll use SciKit Learn's built in Breast Cancer Data Set which has several features of tumors with a labeled class indicating whether the tumor was Malignant or Benign. We will try to create a neural network model that can take in these features and attempt to predict malignant or benign labels for tumors it has not seen before. Let's go ahead and start by getting the data!

This object is like a dictionary, it contains a description of the data and the features and targets:

In [2]:
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

In [3]:
cancer.keys()

['target_names', 'data', 'target', 'DESCR', 'feature_names']

In [3]:
print cancer

{'target_names': array(['malignant', 'benign'],
      dtype='|S9'), 'data': array([[  1.79900000e+01,   1.03800000e+01,   1.22800000e+02, ...,
          2.65400000e-01,   4.60100000e-01,   1.18900000e-01],
       [  2.05700000e+01,   1.77700000e+01,   1.32900000e+02, ...,
          1.86000000e-01,   2.75000000e-01,   8.90200000e-02],
       [  1.96900000e+01,   2.12500000e+01,   1.30000000e+02, ...,
          2.43000000e-01,   3.61300000e-01,   8.75800000e-02],
       ..., 
       [  1.66000000e+01,   2.80800000e+01,   1.08300000e+02, ...,
          1.41800000e-01,   2.21800000e-01,   7.82000000e-02],
       [  2.06000000e+01,   2.93300000e+01,   1.40100000e+02, ...,
          2.65000000e-01,   4.08700000e-01,   1.24000000e-01],
       [  7.76000000e+00,   2.45400000e+01,   4.79200000e+01, ...,
          0.00000000e+00,   2.87100000e-01,   7.03900000e-02]]), 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

In [4]:
# Print full description by running:
# print(cancer['DESCR'])
# 569 data points with 30 features
cancer['data'].shape

(569L, 30L)

In [5]:
print(cancer['DESCR'])

Breast Cancer Wisconsin (Diagnostic) Database

Notes
-----
Data Set Characteristics:
    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry 
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 3 is Mean Radius, field
        13 is Radius SE, field 23 is Worst Radius.

        

In [6]:
X = cancer['data']
y = cancer['target']

In [7]:
X

array([[  1.79900000e+01,   1.03800000e+01,   1.22800000e+02, ...,
          2.65400000e-01,   4.60100000e-01,   1.18900000e-01],
       [  2.05700000e+01,   1.77700000e+01,   1.32900000e+02, ...,
          1.86000000e-01,   2.75000000e-01,   8.90200000e-02],
       [  1.96900000e+01,   2.12500000e+01,   1.30000000e+02, ...,
          2.43000000e-01,   3.61300000e-01,   8.75800000e-02],
       ..., 
       [  1.66000000e+01,   2.80800000e+01,   1.08300000e+02, ...,
          1.41800000e-01,   2.21800000e-01,   7.82000000e-02],
       [  2.06000000e+01,   2.93300000e+01,   1.40100000e+02, ...,
          2.65000000e-01,   4.08700000e-01,   1.24000000e-01],
       [  7.76000000e+00,   2.45400000e+01,   4.79200000e+01, ...,
          0.00000000e+00,   2.87100000e-01,   7.03900000e-02]])

Train Test Split
 
Let's split our data into training and testing sets, this is done easily with SciKit Learn's train_test_split function from model_selection:

In [8]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)

Data Preprocessing
 
The neural network may have difficulty converging before the maximum number of iterations allowed if the data is not normalized. Multi-layer Perceptron is sensitive to feature scaling, so it is highly recommended to scale your data. Note that you must apply the same scaling to the test set for meaningful results. There are a lot of different methods for normalization of data, we will use the built-in StandardScaler for standardization.

In [9]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Fit only to the training data
scaler.fit(X_train)

StandardScaler(copy=True, with_mean=True, with_std=True)

In [10]:
# Now apply the transformations to the data:
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Now it is time to train our model. SciKit Learn makes this incredibly easy, by using estimator objects. In this case we will import our estimator (the Multi-Layer Perceptron Classifier model) from the neural_network library of SciKit-Learn!

from sklearn.neural_network import MLPClassifier

In [11]:
from sklearn.neural_network import MLPClassifier

Next we create an instance of the model, there are a lot of parameters you can choose to define and customize here, we will only define the hidden_layer_sizes. For this parameter you pass in a tuple consisting of the number of neurons you want at each layer, where the nth entry in the tuple represents the number of neurons in the nth layer of the MLP model. There are many ways to choose these numbers, but for simplicity we will choose 3 layers with the same number of neurons as there are features in our data set:

In [12]:
MLPClassifier?

In [13]:
mlp = MLPClassifier(hidden_layer_sizes=(30,30,30))

Now that the model has been made we can fit the training data to our model, remember that this data has already been processed and scaled:

In [14]:
mlp.fit(X_train,y_train)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(30, 30, 30), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

You can see the output that shows the default values of the other parameters in the model. I encourage you to play around with them and discover what effects they have on your model!

### Predictions and Evaluation
Now that we have a model it is time to use it to get predictions! We can do this simply with the predict() method off of our fitted model:

In [15]:
predictions = mlp.predict(X_test)

Now we can use SciKit-Learn's built in metrics such as a classification report and confusion matrix to evaluate how well our model performed:

In [16]:
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,predictions))

[[56  2]
 [ 3 82]]


In [17]:
print(classification_report(y_test,predictions))

             precision    recall  f1-score   support

          0       0.95      0.97      0.96        58
          1       0.98      0.96      0.97        85

avg / total       0.97      0.97      0.97       143



Looks like we only misclassified 3 tumors, leaving us with a 98% accuracy rate (as well as 98% precision and recall). This is pretty good considering how few lines of code we had to write! The downside however to using a Multi-Layer Preceptron model is how difficult it is to interpret the model itself. The weights and biases won't be easily interpretable in relation to which features are important to the model itself.

However, if you do want to extract the MLP weights and biases after training your model, you use its public attributes coefs_ and intercepts_.

coefs_ is a list of weight matrices, where weight matrix at index i represents the weights between layer i and layer i+1.

intercepts_ is a list of bias vectors, where the vector at index i represents the bias values added to layer i+1.

In [18]:
len(mlp.coefs_)

4

In [19]:
len(mlp.coefs_[0])

30

In [20]:
len(mlp.intercepts_[0])

30

### HR Case study

In [4]:
import pandas as pd
import numpy as np

In [5]:
# Load the data
hr_df = pd.read_csv( 'HR_comma_sep.csv' )

In [6]:
import matplotlib as plt
import seaborn as sn
%matplotlib inline

In [7]:
# Encoding Categorical Features
numerical_features = ['satisfaction_level', 'last_evaluation', 'number_project',
     'average_montly_hours', 'time_spend_company']

categorical_features = ['Work_accident','promotion_last_5years', 'department', 'salary']

In [8]:
# An utility function to create dummy variable
def create_dummies( df, colname ):
    col_dummies = pd.get_dummies(df[colname], prefix=colname)
    col_dummies.drop(col_dummies.columns[0], axis=1, inplace=True)
    df = pd.concat([df, col_dummies], axis=1)
    df.drop( colname, axis = 1, inplace = True )
    return df

In [9]:
for c_feature in categorical_features:
  hr_df = create_dummies( hr_df, c_feature )

In [10]:
#Splitting the data

feature_columns = hr_df.columns.difference( ['left'] )
feature_columns1 = feature_columns[1:5]

In [11]:
from sklearn.cross_validation import train_test_split


train_X, test_X, train_y, test_y = train_test_split( hr_df[feature_columns],
                                                  hr_df['left'],
                                                  test_size = 0.2,
                                                  random_state = 42 )



In [12]:
# Creating a confusion matrix

from sklearn import metrics

In [14]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Fit only to the training data
scaler.fit(train_X)

StandardScaler(copy=True, with_mean=True, with_std=True)

In [15]:
# Now apply the transformations to the data:
X_train = scaler.transform(train_X)
X_test = scaler.transform(test_X)

In [16]:
from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(3), verbose=True)
mlp.fit(X_train,train_y)

Iteration 1, loss = 0.55296445
Iteration 2, loss = 0.52898148
Iteration 3, loss = 0.51605617
Iteration 4, loss = 0.50016269
Iteration 5, loss = 0.47897298
Iteration 6, loss = 0.45734926
Iteration 7, loss = 0.43893145
Iteration 8, loss = 0.42313194
Iteration 9, loss = 0.40929472
Iteration 10, loss = 0.39651844
Iteration 11, loss = 0.38486090
Iteration 12, loss = 0.37419872
Iteration 13, loss = 0.36434304
Iteration 14, loss = 0.35517988
Iteration 15, loss = 0.34680184
Iteration 16, loss = 0.33896633
Iteration 17, loss = 0.33069394
Iteration 18, loss = 0.32225558
Iteration 19, loss = 0.31170870
Iteration 20, loss = 0.29946781
Iteration 21, loss = 0.28604618
Iteration 22, loss = 0.27147215
Iteration 23, loss = 0.25682261
Iteration 24, loss = 0.24468986
Iteration 25, loss = 0.23505262
Iteration 26, loss = 0.22737318
Iteration 27, loss = 0.22100051
Iteration 28, loss = 0.21552018
Iteration 29, loss = 0.21080868
Iteration 30, loss = 0.20667423
Iteration 31, loss = 0.20293949
Iteration 32, los

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=3, learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=True, warm_start=False)

In [17]:
dir(mlp)

['__abstractmethods__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__doc__',
 '__format__',
 '__getattribute__',
 '__getstate__',
 '__hash__',
 '__init__',
 '__module__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_cache',
 '_abc_negative_cache',
 '_abc_negative_cache_version',
 '_abc_registry',
 '_backprop',
 '_compute_loss_grad',
 '_estimator_type',
 '_fit',
 '_fit_lbfgs',
 '_fit_stochastic',
 '_forward_pass',
 '_get_param_names',
 '_init_coef',
 '_initialize',
 '_label_binarizer',
 '_loss_grad_lbfgs',
 '_no_improvement_count',
 '_optimizer',
 '_partial_fit',
 '_predict',
 '_random_state',
 '_unpack',
 '_update_no_improvement_count',
 '_validate_hyperparameters',
 '_validate_input',
 'activation',
 'alpha',
 'batch_size',
 'best_loss_',
 'beta_1',
 'beta_2',
 'classes_',
 'coefs_',
 'early_stopping',
 'epsilon',
 'fit',
 'get_params',
 'hidden_layer_sizes',
 'interc

In [21]:
mlp.intercepts_

[array([-0.39740484,  0.71596906,  0.93882086]), array([-3.14096063])]

In [22]:
predictions = mlp.predict(X_test)
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(test_y,predictions))
print(classification_report(test_y,predictions))

[[2232   62]
 [  81  625]]
             precision    recall  f1-score   support

          0       0.96      0.97      0.97      2294
          1       0.91      0.89      0.90       706

avg / total       0.95      0.95      0.95      3000



In [35]:
len(mlp.coefs_)
len(mlp.coefs_[0])
len(mlp.intercepts_[0])

3

In [18]:
mlp.coefs_

[array([[ -2.69084873e-01,  -2.81888328e-01,  -1.21388090e-02],
        [  3.84538476e-01,   1.86983408e+00,   6.98924552e-01],
        [ -6.68781010e-02,  -8.65197994e-02,  -1.66250605e-03],
        [  2.49899747e-02,  -7.41896575e-03,  -6.06193254e-03],
        [  4.73618326e-02,   1.25515915e-01,   6.73489237e-03],
        [  1.20540013e-02,  -5.38384787e-02,  -8.75811887e-03],
        [  9.48848286e-02,   5.58788753e-02,   2.42207396e-02],
        [  3.82144570e-02,  -6.54589857e-02,   2.25251406e-02],
        [  1.05218603e-01,   5.35527382e-02,   2.72690655e-03],
        [  2.98835183e-02,   2.12068751e-01,   3.41058724e-02],
        [  1.21668693e-01,   1.05674057e-01,   1.86818597e-02],
        [ -1.61234943e-01,   1.40241940e+00,   4.39421810e-01],
        [ -1.91701291e+00,   1.01465371e+00,  -7.37519707e-02],
        [ -1.06013676e-01,  -3.60713265e-01,  -1.06127569e-01],
        [  3.26743452e-01,   6.53906271e-01,  -7.17695668e-02],
        [  2.53764521e-01,   5.82557312e

In [23]:
train_X.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 11999 entries, 9838 to 7270
Data columns (total 18 columns):
Work_accident_1            11999 non-null uint8
average_montly_hours       11999 non-null int64
department_RandD           11999 non-null uint8
department_accounting      11999 non-null uint8
department_hr              11999 non-null uint8
department_management      11999 non-null uint8
department_marketing       11999 non-null uint8
department_product_mng     11999 non-null uint8
department_sales           11999 non-null uint8
department_support         11999 non-null uint8
department_technical       11999 non-null uint8
last_evaluation            11999 non-null float64
number_project             11999 non-null int64
promotion_last_5years_1    11999 non-null uint8
salary_low                 11999 non-null uint8
salary_medium              11999 non-null uint8
satisfaction_level         11999 non-null float64
time_spend_company         11999 non-null int64
dtypes: float64(2), i

In [63]:
train_X.head(5)

Unnamed: 0,Work_accident_1,average_montly_hours,department_RandD,department_accounting,department_hr,department_management,department_marketing,department_product_mng,department_sales,department_support,department_technical,last_evaluation,number_project,promotion_last_5years_1,salary_low,salary_medium,satisfaction_level,time_spend_company
9838,0,188,0,0,0,0,0,1,0,0,0,0.61,3,0,1,0,1.0,4
7689,0,196,0,0,0,0,0,0,0,0,1,0.78,4,0,0,0,0.16,5
6557,0,175,1,0,0,0,0,0,0,0,0,0.8,3,0,0,1,0.8,2
6872,0,112,0,1,0,0,0,0,0,0,0,0.86,4,0,0,1,0.66,6
820,0,284,0,0,0,0,0,0,0,0,1,0.93,7,0,1,0,0.11,4
