<h1 align="center">Neural Nets Tutorial</h1> 

<img src="neural-net-head.jpg" style="width: 500px;"  />

## Team Members
* Jorge Molina
* Moiz Kirmani
* Ishan Shah
* Zehui (Poppy) Wu

### Agenda

* Motivation
* Introduction
* Alternative solutions for solving the problem
* Installation instructions, platform restriction and dependent libraries
* Example with a data set
* 2 examples of typical use-cases
* Extra useful features
* Summary
* References
___
<br>

<h2 align="center">Motivation</h2>

**Main Objective & Problems Seeking to Solve:** 
The package focuses on ⁠⁠Multi-Layer Perceptron "MLP" which is a type of an artificial neural network. This type of neural
network is known as a supervised network because it requires a desired output in order to learn. The goal of this
type of network is to create a model that correctly maps the input to the output using historical data so that the model can then be used to produce the output from unknown data. It is mostly used for classification and regression purposes to make meaningful predictions.

___
<br>

<h2 align="center">Introduction</h2> 

### What is a Neural Network?

* **Neural Networks** are a computational approach based on a large collection of neural units, which seeks to model the way a biological brain solves certain problems with large clusters of biological neurons connected by axons. 

* It is composed of a large number of highly interconnected processing elements - or neurones - working in unison to solve specific problems. **Neural Networks**, like people, learn by example and are configured for a specific application, such as pattern recognition or data classification.

<img src="intro.PNG" style="width: 500px;"  />

* In machine learning, the **perceptron** is an algorithm for supervised learning involving binary classifiers, in which the functions decide whether an input belongs to a specific class or not.

<img src="perceptron.PNG" style="width: 500px;"  />

* **Multilayer Perceptron (MLP)** is a supervised learning algorithm that learns a function $\mathbb{f}=\mathbb{R}^m$ -> $\mathbb{R}^o$ by training on a dataset, where **_m_** is the number of dimensions for input and **_o_** is the number of dimensions for output.
Based on a set of neurons in the leftmost layer which represents the input features, the values in the previous layer are transformed in the hidden layers by each neuron and its associated weight . The weighted linear summation is represented as _w1x1 + w2x2 + ... + wmxm_. This is followed by a non-linear activation function . The output layer receives the values from the last hidden layer and transforms them into output values.

<img src="multi-layered perceptron.PNG" style="width: 500px;"  />

___
<br>

<h2 align="center">Alternative Solutions & Comparative Analysis</h2>

##### Comparative Analysis
* There are many different models and methods/algorithms to approach a classification problem, these come with pros and cons as some models work better when approaching a certain type of problem. In the next table we describe some techniques and models used in the analytics field and the pros and cons for each.

<img src="models.PNG" style="width: 500px;"  />

##### Alternative packages in Python implementing Multi-Layer Perceptron:
* perceptron 1.1.0
* sknn.mlp

___
<br>

<h2 align="center">Installation Instructions</h2> 

<img src="install.png" style="width: 150px;"  />

#### First Step
If you already have a working installation of numpy and scipy, install scikit-learn using:
    * pip: "pip install -U scikit-learn"
or 
    * conda: "conda install scikit-learn"

#### Platform restriction
_Scikit-learn requires:_
* Python (>= 2.6 or >= 3.3),
* NumPy (>= 1.6.1),
* SciPy (>= 0.9).

#### Dependent libraries
* Sklearn.neural_network
* Scikit-learn
* NumPy
* SciPy
___
<br>

<h2 align="center">Example with a Data Set</h2> 

#### We are going to use a bank's past data to predict whether a person is likely to set up an IRA (Individual retirement account) or not, based on certain features and attritubes of each person. We will build neural networks using Multi-Layer Perceptron (MLP) Classifier to help the marketing department target customers based on our predictions, which will help them reduce advertising costs.

In [1]:
from sklearn.neural_network import MLPClassifier
import pandas as pd

In [2]:
# import CSV file
rawbankdata = pd.read_csv("bank-data.csv")
rawbankdata.head()

Unnamed: 0,age,sex,region,income,married,children,car,save_act,ch_act,mortgage,ira
0,48,FEMALE,INNER_CITY,17546.0,NO,1,NO,NO,NO,NO,YES
1,40,MALE,TOWN,30085.1,YES,3,YES,NO,YES,YES,NO
2,51,FEMALE,INNER_CITY,16575.4,YES,0,YES,YES,YES,NO,NO
3,23,FEMALE,TOWN,20375.4,YES,3,NO,NO,YES,NO,NO
4,57,FEMALE,RURAL,50576.3,YES,0,NO,YES,NO,NO,NO


In [3]:
# create a duplicate of the dataset
bankdata =  rawbankdata
bankdata.head()

Unnamed: 0,age,sex,region,income,married,children,car,save_act,ch_act,mortgage,ira
0,48,FEMALE,INNER_CITY,17546.0,NO,1,NO,NO,NO,NO,YES
1,40,MALE,TOWN,30085.1,YES,3,YES,NO,YES,YES,NO
2,51,FEMALE,INNER_CITY,16575.4,YES,0,YES,YES,YES,NO,NO
3,23,FEMALE,TOWN,20375.4,YES,3,NO,NO,YES,NO,NO
4,57,FEMALE,RURAL,50576.3,YES,0,NO,YES,NO,NO,NO


In [4]:
# Binarizing the categorical variables

# Make two new features: MALE, FEMALE that are 1 if the sex is MALE or FEMALE, respectively
df_sex = pd.get_dummies(bankdata['sex'])
bankdata = pd.concat([bankdata, df_sex], axis=1)

# Do the same for region
df_region = pd.get_dummies(bankdata['region'])
bankdata = pd.concat([bankdata, df_region], axis = 1)

# Do the same for married
df_married = pd.get_dummies(bankdata['married'])
bankdata = pd.concat([bankdata, df_married], axis = 1)
bankdata.rename(columns={'NO' : 'Married_NO', 'YES' : 'Married_YES'}, inplace = True)

# Do the same for car
df_car = pd.get_dummies(bankdata['car'])
bankdata = pd.concat([bankdata, df_car], axis = 1)
bankdata.rename(columns={'NO' : 'Car_NO', 'YES' : 'Car_YES'}, inplace = True)

# Do the same for save_act
df_save_act = pd.get_dummies(bankdata['save_act'])
bankdata = pd.concat([bankdata, df_save_act], axis = 1)
bankdata.rename(columns={'NO' : 'save_acc_NO', 'YES' : 'save_acc_YES'}, inplace = True)

# Do the same for ch_act
df_ch_act = pd.get_dummies(bankdata['ch_act'])
bankdata = pd.concat([bankdata, df_ch_act], axis = 1)
bankdata.rename(columns={'NO' : 'ch_act_NO', 'YES' : 'ch_act_YES'}, inplace = True)

# Do the same for mortgage
df_mortgage = pd.get_dummies(bankdata['mortgage'])
bankdata = pd.concat([bankdata, df_mortgage], axis = 1)
bankdata.rename(columns={'NO' : 'mortgage_NO', 'YES' : 'mortgage_YES'}, inplace = True)


# Replace YES/NO in ira column with 0 and 1
bankdata['ira'] = bankdata['ira'].map({'YES': 1, 'NO': 0})

bankdata.head()

Unnamed: 0,age,sex,region,income,married,children,car,save_act,ch_act,mortgage,...,Married_NO,Married_YES,Car_NO,Car_YES,save_acc_NO,save_acc_YES,ch_act_NO,ch_act_YES,mortgage_NO,mortgage_YES
0,48,FEMALE,INNER_CITY,17546.0,NO,1,NO,NO,NO,NO,...,1,0,1,0,1,0,1,0,1,0
1,40,MALE,TOWN,30085.1,YES,3,YES,NO,YES,YES,...,0,1,0,1,1,0,0,1,0,1
2,51,FEMALE,INNER_CITY,16575.4,YES,0,YES,YES,YES,NO,...,0,1,0,1,0,1,0,1,1,0
3,23,FEMALE,TOWN,20375.4,YES,3,NO,NO,YES,NO,...,0,1,1,0,1,0,0,1,1,0
4,57,FEMALE,RURAL,50576.3,YES,0,NO,YES,NO,NO,...,0,1,1,0,0,1,1,0,1,0


In [5]:
# Scale all the continuous variables between 0 and 1

min_age = bankdata['age'] - bankdata['age'].min()
bankdata['scaled_age'] = min_age/bankdata['age'].max()

min_income = bankdata['income'] - bankdata['income'].min()
bankdata['scaled_income'] = min_income/bankdata['income'].max()

min_children = bankdata['children'] - bankdata['children'].min()
bankdata['scaled_children'] = min_children/bankdata['children'].max()

bankdata.head()

Unnamed: 0,age,sex,region,income,married,children,car,save_act,ch_act,mortgage,...,Car_YES,save_acc_NO,save_acc_YES,ch_act_NO,ch_act_YES,mortgage_NO,mortgage_YES,scaled_age,scaled_income,scaled_children
0,48,FEMALE,INNER_CITY,17546.0,NO,1,NO,NO,NO,NO,...,0,1,0,1,0,1,0,0.447761,0.198507,0.333333
1,40,MALE,TOWN,30085.1,YES,3,YES,NO,YES,YES,...,1,1,0,0,1,0,1,0.328358,0.397131,1.0
2,51,FEMALE,INNER_CITY,16575.4,YES,0,YES,YES,YES,NO,...,1,0,1,0,1,1,0,0.492537,0.183133,0.0
3,23,FEMALE,TOWN,20375.4,YES,3,NO,NO,YES,NO,...,0,1,0,0,1,1,0,0.074627,0.243326,1.0
4,57,FEMALE,RURAL,50576.3,YES,0,NO,YES,NO,NO,...,0,0,1,1,0,1,0,0.58209,0.721717,0.0


In [17]:
#split the data into train and test data

from sklearn.model_selection import train_test_split
bankdata_train, bankdata_test = train_test_split(bankdata, test_size=0.33, random_state=4)

bankdata_train.head()

Unnamed: 0,age,sex,region,income,married,children,car,save_act,ch_act,mortgage,...,Car_YES,save_acc_NO,save_acc_YES,ch_act_NO,ch_act_YES,mortgage_NO,mortgage_YES,scaled_age,scaled_income,scaled_children
236,65,MALE,TOWN,52255.9,NO,2,YES,YES,YES,NO,...,1,0,1,0,1,1,0,0.701493,0.748323,0.666667
298,47,FEMALE,INNER_CITY,17139.5,NO,2,YES,NO,YES,NO,...,1,1,0,0,1,1,0,0.432836,0.192068,0.666667
292,51,MALE,RURAL,46323.8,YES,2,YES,YES,YES,YES,...,1,0,1,0,1,0,1,0.492537,0.654356,0.666667
129,27,FEMALE,RURAL,21350.3,NO,0,YES,YES,YES,NO,...,1,0,1,0,1,1,0,0.134328,0.258769,0.0
13,66,FEMALE,TOWN,55204.7,YES,1,YES,YES,YES,YES,...,1,0,1,0,1,0,1,0.716418,0.795033,0.333333


In [18]:
#create an MLP classifier and fit the model on the train data

X_train = [bankdata_train['FEMALE'], bankdata_train['MALE'], bankdata_train['INNER_CITY'], bankdata_train['RURAL'], bankdata_train['SUBURBAN'],
     bankdata_train['TOWN'], bankdata_train['Married_NO'], bankdata_train['Married_YES'], bankdata_train['Car_NO'], bankdata_train['Car_YES'], 
     bankdata_train['save_acc_NO'], bankdata_train['save_acc_YES'], bankdata_train['ch_act_NO'], bankdata_train['ch_act_YES'], bankdata_train['mortgage_NO'], 
     bankdata_train['mortgage_YES'], bankdata_train['scaled_age'], bankdata_train['scaled_income'], 
     bankdata_train['scaled_children']]
X1_train = pd.DataFrame(X_train)
X1_train = X1_train.transpose()
Y1_train = bankdata_train['ira']

clf = MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False,
       epsilon=1e-08, hidden_layer_sizes=(5, 2), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
       solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)
clf.fit(X1_train,Y1_train)

MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(5, 2), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
       solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)

In [19]:
# predict ira values on test dat

X_test = [bankdata_test['FEMALE'], bankdata_test['MALE'], bankdata_test['INNER_CITY'], bankdata_test['RURAL'], bankdata_test['SUBURBAN'],
     bankdata_test['TOWN'], bankdata_test['Married_NO'], bankdata_test['Married_YES'], bankdata_test['Car_NO'], bankdata_test['Car_YES'], 
     bankdata_test['save_acc_NO'], bankdata_test['save_acc_YES'], bankdata_test['ch_act_NO'], bankdata_test['ch_act_YES'], bankdata_test['mortgage_NO'], 
     bankdata_test['mortgage_YES'], bankdata_test['scaled_age'], bankdata_test['scaled_income'], 
     bankdata_test['scaled_children']]

X1_test = pd.DataFrame(X_test)
X1_test = X1_test.transpose()

clf.predict(X1_test)

array([0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0,
       1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1,
       1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0,
       0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0,
       0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0,
       1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1], dtype=int64)

In [22]:
# Accuracy of our model
clf.score(X1_test, bankdata_test['ira'])

0.71717171717171713

<h2 align="center">Two Examples of Typical Use-Case</h2> 

#### Multi-layer perceptron has many applications, such as:
* Statistical analysis
* Pattern recognition
* Optical character recognition


<h4 align="center">Example 1</h4> 

Multilayer Perceptron (MLP) can be used for classifying people's faces from images, and it does this by performing recognition of face characteristics. The human face has about 80 different characteristic parameters, and some of them are: 
1. Distance between middles of the eyes
2. Distance between middle of the left eyes and middle point of mouth
3. Distance between middle of the right eyes and middle point of mouth
4. Distance between middle of the left eyes and middle point of nose
5. Distance between middle of the rigth eyes and middle point of nose
6. Distance between middle point of mouth and middle point of nose
7. Distance of middle point of J1 and middle of nose
8. Width of nose

Face characteristics are made of these parameters and then inserted in a database. Then a MLP classifier is used to adjust to the network's weights and thresholds to minimize the error in its predictions on the training set. If the model is properly constructed, it can subsequently be used to make predictions where the output is unknown, such as recognizing which face in a picture belongs to a particular person.

<h4 align="center">Example 2</h4> 

For example, a credit card company typically receives hundreds of thousands of applications for new credit cards. The applications contain several different information such as:
1. Annual salary
2. Outstanding debts
3. Age
4. Household size
5. Marriage status
6. Employment status
7. Credit score

In this example, MLP is extremely useful to categorize applications and people into their likelihood of default. Meaning that some people are going to get approved or rejected right away, and other might be put in a category where further human analysis is needed.
___
<br>

<h2 align="center">Extra Useful Resources</h2> 

Multi-Layer Perceptron from sklearn.neural_network import MLPClassifier can also we used for:

* Regression
* Regularization


<h4 align="center">Regression</h4> 

Class MLPRegressor implements a multi-layer perceptron (MLP) that trains using backpropagation with no activation function in the output layer, which can also be seen as using the identity function as activation function. Therefore, it uses the square error as the loss function, and the output is a set of continuous values. MLPRegressor also supports multi-output regression, in which a sample can have more than one target.

<h4 align="center">Regularization</h4> 

Both MLPRegressor and class:MLPClassifier use parameter alpha for regularization (L2 regularization) term which helps in avoiding overfitting by penalizing weights with large magnitudes. Following plot displays varying decision function with value of alpha.

<img src="regularization.png" style="width: 400px;"  />


___
<br>

<h2 align="center">Summary</h2> 

MLP networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyze. This expert can then be used to provide projections given new situations of interest and answer "what if" questions. 
What sets apart MLP networks is their capability of generalization, that is, they classify an unknown pattern with other known patterns that share the same distinguishing features. This means noisy or incomplete inputs will be classified because of their similarity with pure and complete inputs. They are also highly fault tolerant. This characteristic is also known as "graceful degradation".  Because of its distributed nature, a neural network keeps on working even when a significant fraction of its neurons and interconnections fail. Also, relearning after damage can be relatively quick.

However, MLP is a computationally expensive learning process. Large number of iterations are required for learning so it’s not suitable for real-time learning. Moreover, there is no guaranteed solution. It will perform better on some kinds of problems while not so good on others. It is a black box learning approach, so it cannot interpret relationships between the input and output variables and cannot deal with uncertainties. So, how the MLP performs is situation dependent in the sense that the user should have a fair idea which problems he/she should use the MLP classifier for. For instance, an MLP classifier would be best suited for speech synthesis or pattern recognition.
___
<br>

<h2 align="center">References</h2> 

1. Tutorials - sklearn.neural_network.MLPClassifier, retrieved from http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
2. User Guide - 1.17. Neural network models (supervised), retrieved from http://scikit-learn.org/stable/modules/neural_networks_supervised.html
3. Perceptrons and Multilayer Perceptrons - Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks, retrieved from http://www.cogsys.wiai.uni-bamberg.de/teaching/ss05/ml/slides/cogsysII-4.pdf
4. Stojilkovic, J, Face Recognition Using Neural Network - An example of face recognition using characteristic points of face, retrieved from  http://neuroph.sourceforge.net/tutorials/FaceRecognition/FaceRecognitionUsingNeuralNetwork.html
5. Wankhede, Sonali B. Analytical Study of Neural Network Techniques: SOM, MLP and Classifier-A Survey, retrieved from http://www.iosrjournals.org/iosr-jce/papers/Vol16-issue3/Version-7/N016378692.pdf
6. Why MultiLayer Perceptron/Neural Network?, retrieved from http://courses.media.mit.edu/2006fall/mas622j/Projects/manu-rita-MAS_Proj/MLP.pdf