<a href="https://colab.research.google.com/github/mkmritunjay/machineLearning/blob/master/ANNClassifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Networks
 
Neural Networks are a machine learning framework that attempts to mimic the learning pattern of natural biological neural networks. Biological neural networks have interconnected neurons with dendrites that receive inputs, then based on these inputs they produce an output signal through an axon to another neuron. We will try to mimic this process through the use of Artificial Neural Networks (ANN), which we will just refer to as neural networks from now on. The process of creating a neural network begins with the most basic form, a single perceptron.

---
The Perceptron
 
Let's start our discussion by talking about the Perceptron! A perceptron has one or more inputs, a bias, an activation function, and a single output. The perceptron receives inputs, multiplies them by some weight, and then passes them into an activation function to produce an output. There are many possible activation functions to choose from, such as the logistic function, a trigonometric function, a step function etc. We also make sure to add a bias to the perceptron, this avoids issues where all inputs could be equal to zero (meaning no multiplicative weight would have an effect).


---
Once we have the output we can compare it to a known label and adjust the weights accordingly (the weights usually start off with random initialization values). We keep repeating this process until we have reached a maximum number of allowed iterations, or an acceptable error rate.

To create a neural network, we simply begin to add layers of perceptrons together, creating a multi-layer perceptron model of a neural network. You'll have an input layer which directly takes in your feature inputs and an output layer which will create the resulting outputs. Any layers in between are known as hidden layers because they don't directly "see" the feature inputs or outputs.


---
Data
 
We'll use SciKit Learn's built in Breast Cancer Data Set which has several features of tumors with a labeled class indicating whether the tumor was Malignant or Benign. We will try to create a neural network model that can take in these features and attempt to predict malignant or benign labels for tumors it has not seen before. Let's go ahead and start by getting the data!

In [0]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report,confusion_matrix

In [0]:
# This object is like a dictionary, it contains a description of the data and the features and targets:
cancer = load_breast_cancer()

In [7]:
cancer.keys()

dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

In [8]:
cancer['data'].shape

(569, 30)

In [0]:
X = cancer['data']
Y = cancer['target']

### Train Test Split

In [0]:
X_train, X_test, y_train, y_test = train_test_split(X, Y)

### Data Preprocessing
 
The neural network may have difficulty converging before the maximum number of iterations allowed if the data is not normalized. Multi-layer Perceptron is sensitive to feature scaling, so it is highly recommended to scale your data. Note that you must apply the same scaling to the test set for meaningful results. There are a lot of different methods for normalization of data, we will use the built-in StandardScaler for standardization.

In [13]:
scaler = StandardScaler()
# Fit only to the training data
scaler.fit(X_train)

StandardScaler(copy=True, with_mean=True, with_std=True)

In [0]:
# Now apply the transformations to the data:
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

### Model Building

Now it's time to train our model. SciKit Learn makes this incredibly easy, by using estimator objects. In this case we will import our estimator (the Multi-Layer Perceptron Classifier model) from the neural_network library of SciKit-Learn!

**from sklearn.neural_network import MLPClassifier**

Next we create an instance of the model, there are a lot of parameters you can choose to define and customize here, we will only define the hidden_layer_sizes. For this parameter you pass in a tuple consisting of the number of neurons you want at each layer, where the nth entry in the tuple represents the number of neurons in the nth layer of the MLP model. There are many ways to choose these numbers, but for simplicity we will choose 3 layers with the same number of neurons as there are features in our data set.

In [0]:
mlp = MLPClassifier(hidden_layer_sizes=(30,30,30))

Now that the model has been made we can fit the training data to our model, remember that this data has already been processed and scaled.

In [17]:
mlp.fit(X_train,y_train)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(30, 30, 30), learning_rate='constant',
              learning_rate_init=0.001, max_fun=15000, max_iter=200,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=None, shuffle=True, solver='adam',
              tol=0.0001, validation_fraction=0.1, verbose=False,
              warm_start=False)

### Prediction and evaluation

Now that we have a model it's time to use it to get predictions. 

We can do this simply with the predict() method off of our fitted model.

In [0]:
predictions = mlp.predict(X_test)

In [19]:
print(classification_report(y_test,predictions))

              precision    recall  f1-score   support

           0       0.98      0.96      0.97        53
           1       0.98      0.99      0.98        90

    accuracy                           0.98       143
   macro avg       0.98      0.98      0.98       143
weighted avg       0.98      0.98      0.98       143



With a 98% accuracy rate (as well as 98% precision and recall) this is pretty good considering how few lines of code we had to write. The downside however to using a Multi-Layer Preceptron model is how difficult it is to interpret the model itself. The weights and biases won't be easily interpretable in relation to which features are important to the model itself.

However, if you do want to extract the MLP weights and biases after training your model, you use its public attributes coefs_ and intercepts_.

coefs_ is a list of weight matrices, where weight matrix at index i represents the weights between layer i and layer i+1.

intercepts_ is a list of bias vectors, where the vector at index i represents the bias values added to layer i+1.

In [20]:
len(mlp.coefs_)

4

In [21]:
len(mlp.coefs_[0])

30

In [22]:
len(mlp.intercepts_[0])

30

# HR case study

Let's look into one more data set and build an artificial neural network.
We will use same data set that we used for bagging and boosting.

In [0]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
import sklearn.neural_network as nn
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, confusion_matrix
url = 'https://raw.githubusercontent.com/mkmritunjay/machineLearning/master/HR_comma_sep.csv'

In [0]:
hr_df = pd.read_csv(url)

In [26]:
hr_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14999 entries, 0 to 14998
Data columns (total 10 columns):
satisfaction_level       14999 non-null float64
last_evaluation          14999 non-null float64
number_project           14999 non-null int64
average_montly_hours     14999 non-null int64
time_spend_company       14999 non-null int64
Work_accident            14999 non-null int64
left                     14999 non-null int64
promotion_last_5years    14999 non-null int64
department               14999 non-null object
salary                   14999 non-null object
dtypes: float64(2), int64(6), object(2)
memory usage: 1.1+ MB


In [0]:
# Encoding Categorical Features
numerical_features = ['satisfaction_level', 'last_evaluation', 'number_project',
     'average_montly_hours', 'time_spend_company']

categorical_features = ['Work_accident','promotion_last_5years', 'department', 'salary']

In [0]:
# A utility function to create dummy variable
def create_dummies( df, colname ):
    col_dummies = pd.get_dummies(df[colname], prefix=colname)
    col_dummies.drop(col_dummies.columns[0], axis=1, inplace=True)
    df = pd.concat([df, col_dummies], axis=1)
    df.drop( colname, axis = 1, inplace = True )
    return df

In [0]:
for c_feature in categorical_features:
    hr_df = create_dummies( hr_df, c_feature )

### Train Test Split

In [0]:
feature_columns = hr_df.columns.difference( ['left'] )

In [0]:
train_X, test_X, train_y, test_y = train_test_split( hr_df[feature_columns],
                                                  hr_df['left'],
                                                  test_size = 0.2,
                                                  random_state = 42 )

In [32]:
scaler = StandardScaler()
# Fit only to the training data
scaler.fit(train_X)

StandardScaler(copy=True, with_mean=True, with_std=True)

In [0]:
# Now apply the transformations to the data:
X_train = scaler.transform(train_X)
X_test = scaler.transform(test_X)

In [34]:
mlp = MLPClassifier(hidden_layer_sizes=(3,2), verbose=True)
mlp.fit(X_train,train_y)

Iteration 1, loss = 0.58779275
Iteration 2, loss = 0.56179120
Iteration 3, loss = 0.53318591
Iteration 4, loss = 0.49977348
Iteration 5, loss = 0.46978772
Iteration 6, loss = 0.44972666
Iteration 7, loss = 0.43456158
Iteration 8, loss = 0.42171515
Iteration 9, loss = 0.40867200
Iteration 10, loss = 0.39461142
Iteration 11, loss = 0.38379212
Iteration 12, loss = 0.37441935
Iteration 13, loss = 0.36613954
Iteration 14, loss = 0.35891378
Iteration 15, loss = 0.35247080
Iteration 16, loss = 0.34644269
Iteration 17, loss = 0.34101914
Iteration 18, loss = 0.33592822
Iteration 19, loss = 0.33080714
Iteration 20, loss = 0.32562051
Iteration 21, loss = 0.32014340
Iteration 22, loss = 0.31421007
Iteration 23, loss = 0.30813982
Iteration 24, loss = 0.30259685
Iteration 25, loss = 0.29777809
Iteration 26, loss = 0.29351497
Iteration 27, loss = 0.28932915
Iteration 28, loss = 0.28545985
Iteration 29, loss = 0.28181557
Iteration 30, loss = 0.27860862
Iteration 31, loss = 0.27563963
Iteration 32, los

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(3, 2), learning_rate='constant',
              learning_rate_init=0.001, max_fun=15000, max_iter=200,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=None, shuffle=True, solver='adam',
              tol=0.0001, validation_fraction=0.1, verbose=True,
              warm_start=False)

In [35]:
mlp.coefs_

[array([[ 1.33158312e-01,  9.26196253e-02,  1.63619623e-02],
        [-6.22340973e-01,  4.54692915e-03, -5.35144293e-01],
        [ 1.67656172e-01, -1.60172174e+00,  1.90288345e-01],
        [-1.67243246e+00, -1.26241530e+00,  9.60035413e-01],
        [-9.12001405e-02, -1.42712087e-02, -4.70135911e-02],
        [ 1.90357105e-02,  3.54062209e-03,  8.65901157e-02],
        [-1.55059043e-02,  8.51892255e-03,  2.36026734e-02],
        [ 2.27978610e-02,  2.45832618e-02,  1.34233869e-02],
        [-4.81665724e-02, -2.52987087e-02, -2.23653442e-02],
        [-1.00754864e-01, -1.43216461e-02, -7.99228778e-02],
        [-1.00719762e-01, -1.89480792e-03,  1.61875076e-02],
        [-6.02776751e-01,  1.59439632e-01, -6.22061790e-01],
        [-4.84880771e-01, -1.23753227e-01, -1.18814251e+00],
        [ 1.50372732e-01,  4.04228033e-02,  1.27232103e-01],
        [-2.78002139e-01, -1.88985505e-01, -4.79197360e-02],
        [-2.03838430e-01, -1.26200459e-01, -2.88048930e-02],
        [-2.46901258e-01

In [36]:
mlp.intercepts_

[array([ 1.07929607,  0.78397525, -0.75217458]),
 array([-0.00357943, -0.02403663]),
 array([1.30503351])]

In [0]:
predictions = mlp.predict(X_test)

In [38]:
print(confusion_matrix(test_y, predictions))

[[2156  138]
 [  98  608]]


In [39]:
print(classification_report(test_y, predictions))

              precision    recall  f1-score   support

           0       0.96      0.94      0.95      2294
           1       0.82      0.86      0.84       706

    accuracy                           0.92      3000
   macro avg       0.89      0.90      0.89      3000
weighted avg       0.92      0.92      0.92      3000



In [42]:
print(len(mlp.coefs_))
print(len(mlp.coefs_[0]))
print(len(mlp.intercepts_[0]))

3
18
3
