# Basics of MLP
- Objective: Create vanilla neural networks (i.e., Multilayer perceptrons) for simple regression/classification tasks with Keras

In [218]:
#!pip install tensorflow

In [220]:
#!pip install keras

## MLP Structures
- Each MLP model is consisted of one input layer, several hidden layers, and one output layer
- Number of neurons in each layer is not limited
<img src="http://cs231n.github.io/assets/nn1/neural_net.jpeg" style="width: 300px"/>
<br>
<center>**MLP with one hidden layer**</center>
- Number of input neurons: 3
- Number of hidden neurons: 4
- Number of output neurons: 2


<img src="http://cs231n.github.io/assets/nn1/neural_net2.jpeg" style="width: 500px"/>
<br>
<center>**MLP with two hidden layers**</center>

- Number of input neurons: 3
- Number of hidden neurons: (4, 4)
- Number of output neurons: 1


## (I) MLP for Regression tasks - Predict house price
- When the target (**y**) is continuous (real)
- For loss function and evaluation metric, mean squared error (MSE) is commonly used

- Data:

https://keras.io/api/datasets/boston_housing/

This is a dataset taken from the StatLib library which is maintained at Carnegie Mellon University.

Samples contain 13 attributes of houses at different locations around the Boston suburbs in the late 1970s. 

Targets are the median values of the houses at a location (in k$).

The attributes themselves are defined in the StatLib website.


http://lib.stat.cmu.edu/datasets/boston

Variables in order:

- CRIM     per capita crime rate by town
-  ZN       proportion of residential land zoned for lots over 25,000 - sq.ft.
- INDUS    proportion of non-retail business acres per town
- CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX      nitric oxides concentration (parts per 10 million)
- RM       average number of rooms per dwelling
- AGE      proportion of owner-occupied units built prior to 1940
- DIS      weighted distances to five Boston employment centres
- RAD      index of accessibility to radial highways
- TAX      full-value property-tax rate per $10,000
- PTRATIO  pupil-teacher ratio by town
- B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT    % lower status of the population
- MEDV     Median value of owner-occupied homes in $1000's


In [222]:
# Load data
from keras.datasets import boston_housing

In [224]:
(X_train, y_train), (X_test, y_test) = boston_housing.load_data()

In [226]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(404, 13)
(102, 13)
(404,)
(102,)


### 1. Creating a model
- Keras model object can be created with Sequential class
- At the outset, the model is empty per se. It is completed by **'adding'** additional layers and compilation
- Ref: https://keras.io/models/sequential/
- Ref: https://keras.io/getting-started/sequential-model-guide/

In [228]:
# The Sequential model is a linear stack of layers.
# You can create a Sequential model by passing a list of 
# layer instances to the constructor:

# OR 

# You can also simply add layers via the .add() method:

# Ref: https://keras.io/api/models/sequential/

from keras.models import Sequential

In [230]:
model = Sequential() # Instantiate an empty model

In [232]:
# model.<press TAB key> # TODO: Explore its attributes

- **compile** - First, we want to decide a model architecture, this is the number of hidden layers and activation functions, etc. (compile)

- **fit** - Secondly, we want to train our model to get all the paramters to the correct value to map our inputs to our outputs. (fit)

- **predict** - Lastly, we will want to use this model to do some feed-forward passes to predict novel inputs. (predict)

### 1-1. Adding layers
- Keras layers can be **added** to the model
- Adding layers are like stacking lego blocks one by one
- Doc: https://keras.io/layers/core/

In [234]:
from keras.layers import Activation, Dense

In [236]:
# Method 1 - Activation will be mentioned at the time of 
# layer creation

# References for Dense layer:
# https://medium.com/@hunterheidenreich/understanding-keras-dense-layers-2abadff9b990
# https://keras.io/api/layers/core_layers/dense/

model.add(Dense(10, input_shape = (13,), 
                activation = 'sigmoid')
         ) # Input layer

model.add(Dense(10, activation = 'sigmoid')) # Hideen Layer1

model.add(Dense(10, activation = 'sigmoid')) # Hidden Layer2

model.add(Dense(1)) # Output Layer

In [238]:
# This is equivalent to the above code block
# Keras model with two hidden layer with 10 neurons each 
# Method 2

# You should execute either previous cell or this cell, 
# otherwise it will add 4 more layers

# Input layer => input_shape should be explicitly designated
# 
model.add(Dense(10, input_shape = (13,)))    # A sample has 13 attributes
model.add(Activation('sigmoid'))

# Hidden layer1 => only output dimension should be designated
model.add(Dense(10))                         
model.add(Activation('sigmoid'))

# Hidden layer2 => only output dimension should be designated
model.add(Dense(10))                         
model.add(Activation('sigmoid'))

# Output layer => output dimension = 1 since it is regression problem
# In regression, it should output a single continuous value
model.add(Dense(1))                          

In [240]:
# Get the model configuration - Too detailed..may require time to understand
# Optional exploration..can be done after understanding MLP fully

# model.get_config()
# model.get_weights()

# Similarly we can get lot of details abot model...but we have
# trained the model yet

### 1-2. Model compile
- Keras model should be "compiled" prior to training
- Types of loss (function) and optimizer should be designated
    - Doc (optimizers): https://keras.io/optimizers/
    - Doc (losses): https://keras.io/losses/


**Learning Rate**

Ref: https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/

The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. Choosing the learning rate is challenging as a value too small may result in a long training process that could get stuck, whereas a value too large may result in learning a sub-optimal set of weights too fast or an unstable training process.

<img src="learning_rate.png">

In [242]:
# Instantiate an optimizaer
# Stochastic --> Random
# SGD: while selecting data points at each step to calculate the derivatives. SGD randomly picks one data point from the whole data set at each iteration to reduce the computations enormously.
from keras import optimizers
sgd = optimizers.SGD(lr = 0.01)    # stochastic gradient descent optimizer

In [244]:
# Build MLP model
model.compile(optimizer = sgd, 
              loss = 'mean_squared_error', 
              metrics = ['mse'])  # for regression problems, mean squared error (MSE) is often employed

### Summary of the model

In [246]:
model.summary() 
# dense - 13x10=130 weights and 10 biases, 1 bias for every neuron

# dense1 - 10 outputs from 10 neurons of dense, so,
#          10x10=100 weights and 10 biases
#          Total params = 100+10 = 110

# dense2 - 10 outputs from 10 neurons of previous dense layer
#          and dense2 has 10 neurons, all are connected
#          10x10=100 weights and 10 biases
#          Total params = 100+10 = 110

# dense3 - 10 outputs from 10 neurons of dense2 layer
#          only one neuron in this layer
#          10x1=10 weights and 1 bias for that single neuron
#          Total params = 10+1 = 11

Model: "sequential_12"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_92 (Dense)            (None, 10)                140       
                                                                 
 dense_93 (Dense)            (None, 10)                110       
                                                                 
 dense_94 (Dense)            (None, 10)                110       
                                                                 
 dense_95 (Dense)            (None, 1)                 11        
                                                                 
 dense_96 (Dense)            (None, 10)                20        
                                                                 
 activation_43 (Activation)  (None, 10)                0         
                                                                 
 dense_97 (Dense)            (None, 10)              

### 2. Training
- Training the model with training data provided

In [248]:
# Train your MLP model
model.fit(X_train, 
          y_train, 
          batch_size = 50, 
          epochs = 10, 
          verbose = 1)

# verbose: 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch. Note that the progress bar is not particularly useful when logged to a file, so verbose=2 is recommended when not running interactively (eg, in a production environment).

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1b495f06eb0>

### 3. Evaluation
- Keras model can be evaluated with evaluate() function
- Evaluation results are contained in a list
    - Doc (metrics): https://keras.io/metrics/

In [250]:
results = model.evaluate(X_test, y_test)



In [252]:
print(model.metrics_names)     # list of metric names the model is employing
print(results)                 # actual figure of metrics computed

['loss', 'mse']
[83.3297348022461, 83.3297348022461]


In [254]:
print('loss: ', results[0])
print('mse: ', results[1])

loss:  83.3297348022461
mse:  83.3297348022461


## (II) MLP for classification tasks
- When the target (**y**) is discrete (categorical)
- For loss function, cross-entropy is used and for evaluation metric, accuracy is commonly used

In [256]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

In [258]:
whole_data = load_breast_cancer()

In [260]:
X_data = whole_data.data
y_data = whole_data.target

In [262]:
#y_data

In [264]:
X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size = 0.3, random_state = 7) 

### Dataset Description
- Breast cancer dataset has total 569 data instances (212 malign, 357 benign instances)
- 30 attributes (features) to predict the binary class (M/B)
- Doc: http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html#sklearn.datasets.load_breast_cancer

In [266]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(398, 30)
(171, 30)
(398,)
(171,)


### 1. Creating a model
- Same with regression model at the outset

In [268]:
from keras.models import Sequential

In [270]:
model = Sequential()

### 1-1. Adding layers
- Keras layers can be **added** to the model
- Adding layers are like stacking lego blocks one by one
- It should be noted that as this is a classification problem, sigmoid layer (softmax for multi-class problems) should be added
- Doc: https://keras.io/layers/core/

In [272]:
# Method 1: This is equivalent to the below code block
model.add(Dense(10, input_shape = (30,), activation = 'sigmoid'))
model.add(Dense(10, activation = 'sigmoid'))
model.add(Dense(10, activation = 'sigmoid'))
model.add(Dense(2, activation = 'softmax'))

In [274]:
# Method 2: Keras model with two hidden layer with 10 neurons each 
model.add(Dense(10, input_shape = (30,)))    # Input layer => input_shape should be explicitly designated
model.add(Activation('sigmoid'))
model.add(Dense(10))                         # Hidden layer => only output dimension should be designated
model.add(Activation('sigmoid'))
model.add(Dense(10))                         # Hidden layer => only output dimension should be designated
model.add(Activation('sigmoid'))
model.add(Dense(1))                          # Output layer => output dimension = 1 since it is binary classification problem
model.add(Activation('softmax'))

### 1-2. Model compile
- Keras model should be "compiled" prior to training
- Types of loss (function) and optimizer should be designated
    - Doc (optimizers): https://keras.io/optimizers/
    - Doc (losses): https://keras.io/losses/

In [276]:
from keras import optimizers

In [278]:
sgd = optimizers.SGD(lr = 0.01)    # stochastic gradient descent optimizer

In [280]:
model.compile(optimizer = sgd, 
              loss = 'binary_crossentropy', 
              metrics = ['accuracy'])

### Summary of the model

In [282]:
model.summary()

Model: "sequential_14"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_116 (Dense)           (None, 10)                310       
                                                                 
 dense_117 (Dense)           (None, 10)                110       
                                                                 
 dense_118 (Dense)           (None, 10)                110       
                                                                 
 dense_119 (Dense)           (None, 2)                 22        
                                                                 
 dense_120 (Dense)           (None, 10)                30        
                                                                 
 activation_53 (Activation)  (None, 10)                0         
                                                                 
 dense_121 (Dense)           (None, 10)              

### 2. Training
- Training the model with training data provided

In [None]:
model.fit(X_train, 
          y_train, 
          batch_size = 2, 
          epochs = 10, 
          verbose = 1)

In [286]:
X_train.shape 

(398, 30)

In [288]:
y_train.shape

(398,)

In [290]:
y_test.shape


(171,)

### 3. Evaluation
- Keras model can be evaluated with evaluate() function
- Evaluation results are contained in a list
    - Doc (metrics): https://keras.io/metrics/

In [292]:
results = model.evaluate(X_test, y_test)



In [294]:
print(model.metrics_names)     # list of metric names the model is employing
print(results)                 # actual figure of metrics computed

['loss', 'accuracy']
[0.6377596259117126, 0.6783625483512878]


In [296]:
print('loss: ', results[0])
print('accuracy: ', results[1])

loss:  0.6377596259117126
accuracy:  0.6783625483512878
