# Part 8 - Deep Learning (DL)

![Alt](./deeplearning.png "Deep Learning")

### Neuron
In DL a neuron is a node that processes the weighted inputs and passes the signal onto another neuron along the line.

### Activation function
* Threshold $ \phi(x) = \text{0 if x < 0, 1 otherwise} $

* Sigmoid $ \phi(x) = \frac{1}{1+e^{-x}} $
    - Smooth transition from 0 to 1
    - good for predicting probabilities in the final/outpit layer
    
* Rectifier (Relu?) $ \phi(x) = max(x,0) $
    - _[Deep sparse rectifier neural networks](http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf) by Xavier Glorot et al. 2011_

* Hyperbolic Tangent (tanh) $ \phi(x) = \frac{1-e^{-2x}}{1+e^{-2x}} $

__Cost function__: $C = 0.5(\hat{y}-y)^2$, function to be minimized

[A list of cost functions used in neural network, alongside applications](http://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-network-alongside-applications) CrossValidated 2015

### Gradient Descent
* finding the right weights that give the minimum cost function requires a multi-demensional optimization problem
* Convex cost function
* Batch gradient descent, where all the samples are used for the calculation of the cost function the then all the weights are adjusted

### Stochastic Gradient Descent
* Useful for cases when the cost function has local minimums, in oder to find the __global minimum__.
* The weights are adjust after each indivisual sample evaluation

[A Neural Network in 13 lines of Python (part 2 - gradient descent)](https://iamtrask.github.io/2015/07/27/python-network-part2/)

[Neural Network and Deep Learning by Michael Nielsen (2015)](http://neuralnetworksanddeeplearning.com/chap2.html)

### Backpropagation
* information propagates from output to input, allowing the update of all the weights simultanuously
* allow us to see which part of the network is responsible for the most error

## Artificial Neural Network (ANN) with Stochastic Gradient Descent

__Step 1__: Randomly initialize the weights to small numbers close to 0, but not 0.

__Step 2__: Input the first observation of you dataset in the input layer, each feature in one input node.

__Step 3__: Forward-Propagation: from left to right, the neurons are activated in a way that the impact of each neuron's activation is limited by the weights. Propagate the activations until getting the predicted result y.

__Step 4__: Compare the predicted result to tue actual result. Measure the generated error.

__Step 5__: Back-Porpagation: from right to left, the error is back propagated. Update the weights according to how much they are responsible for the error. The learning rate decides by how much we update the weights.

__Step 6__: Repeat steps 1-5 and update the weights after each observation (Reinforcement Learning). Or: Repeat steps 1-5 but update the weights only after a batch of observation (Batch Learning)

__Step 7__: When the whole training set passed throught the ANN, that makes and epoch. Redo more epochs.

In [10]:
# import the libraries that will be used
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix

# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten

from keras.preprocessing.image import ImageDataGenerator


In [2]:
# import dataset
dir1 = '/disk1/sousae/Classes/udemy_machineLearning/Machine_Learning_A-Z/Part8_Deep_Learning/'
dataset = pd.read_csv(dir1+'Churn_Modelling.csv')
dataset.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [3]:
# Preprocess the data

X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

# Encoding categorical data
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]

# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [4]:
# Initialising the ANN
classifier = Sequential()

# Adding the input layer and the first hidden layer
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu', input_dim = 11))

# Adding the second hidden layer
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))

# Adding the output layer
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))

# Compiling the ANN
classifier.compile(optimizer = 'sgd', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, nb_epoch = 100)

  """
  
  # This is added back by InteractiveShellApp.init_path()


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7fef41387bd0>

In [5]:
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)

# Making the Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[1507   88]
 [ 185  220]]


## Convolutional Neural Network

__Step 1: Convolution__
$$ (f*g)(t) = \int_{-\infty}^{\infty} f(\tau)g(t-\tau) d\tau $$

[Introduction to Convolutional Neural Networks by Jianxin Wu](http://cs.nju.edu.cn/wujx/paper/CNN.pdf)

[Understanding Convulutional Neural Networks with a Mathematical Model by C.C. Jay Kup 2016](https://arxiv.org/pdf/1609.04112.pdf)

Input image + Feature Detector matrix (3x3) = Feature Map
* The process will make the picture smaller.
* Finds the features of the image while keep the spacial relations between

There's multiple feature maps created using different filters/feature detector.

__Step 1 (B) is a ReLU layer to beak up the linearity.__


__Step 2: Max Pooling__
* Used to address spacial variance (i.e. tild, stretch, distortion in an image)
* Feature Map ---- Max pooling ----> Pooled Feature Map
* Overlap with a matrix and selecting the maximun
* Preserves the fearures
* Reduces the size of the parameter space
* Prevents overfitting by disregarding unnecessary information

[Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition by Dominik Scherer et al. 2010](http://ais.uni-bonn.de/papers/icann2010_maxpool.pdf)


__Step 3: Flattening__
* Pooled Feature Map ---- Flattening ----> vector to be the input to an ANN


__Step 4: Full Connection__
* Add an ANN to the previous steps

[The 9 Deep learning papers You need to know about by Adit Deshpande 2016](https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html)

#### Softmax and Cross-entropy
* Softmax function $ f_j(z) = \frac{e^{zj}}{\sum_k e^{zk}} $
* Makes sure the probabilities of the different classifications all add up to 1
* Cross-Entropy function, used to calculate the error in the classification instead of MSE
$$ L_i = \log \left(\frac{e^{f_{yi}}}{\sum_k e^{z_jk}} \right) $$
$$H(p,q) = -\sum_x p(x) \log q(x)$$

[A Friendly Introduction to Cross-Entropy Loss by Rob DiPietro 2016](https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/)

In [11]:
# Part 1 - Building the CNN

# Initialising the CNN
classifier = Sequential()

In [12]:
help(Convolution2D)

Help on class Conv2D in module keras.layers.convolutional:

class Conv2D(_Conv)
 |  2D convolution layer (e.g. spatial convolution over images).
 |  
 |  This layer creates a convolution kernel that is convolved
 |  with the layer input to produce a tensor of
 |  outputs. If `use_bias` is True,
 |  a bias vector is created and added to the outputs. Finally, if
 |  `activation` is not `None`, it is applied to the outputs as well.
 |  
 |  When using this layer as the first layer in a model,
 |  provide the keyword argument `input_shape`
 |  (tuple of integers, does not include the sample axis),
 |  e.g. `input_shape=(128, 128, 3)` for 128x128 RGB pictures
 |  in `data_format="channels_last"`.
 |  
 |  # Arguments
 |      filters: Integer, the dimensionality of the output space
 |          (i.e. the number output of filters in the convolution).
 |      kernel_size: An integer or tuple/list of 2 integers, specifying the
 |          width and height of the 2D convolution window.
 |        

In [13]:
# Step 1 - Convolution
classifier.add(Convolution2D(32, 3, 3, input_shape=(64, 64, 3), activation='relu'))

# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size=(2, 2)))

# Adding a second convolutional layer
classifier.add(Convolution2D(32, 3, 3, activation='relu'))
classifier.add(MaxPooling2D(pool_size=(2, 2)))

# Step 3 - Flattening
classifier.add(Flatten())

# Step 4 - Full connection
classifier.add(Dense(output_dim = 128, activation='relu'))
classifier.add(Dense(output_dim = 1, activation='sigmoid'))

# Compiling the CNN
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

  
  
  from ipykernel import kernelapp as app
  app.launch_new_instance()


In [14]:
# Part 2 - Fitting the CNN to the images
train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

training_set = train_datagen.flow_from_directory('/disk1/sousae/Classes/udemy_machineLearning/Machine_Learning_A-Z/Part8_Deep_Learning/dataset/training_set',
                                                 target_size = (64, 64),
                                                 batch_size = 32,
                                                 class_mode = 'binary')

test_set = test_datagen.flow_from_directory('/disk1/sousae/Classes/udemy_machineLearning/Machine_Learning_A-Z/Part8_Deep_Learning/dataset/test_set',
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'binary')

classifier.fit_generator(training_set,
                         samples_per_epoch = 8000,
                         nb_epoch = 25,
                         validation_data = test_set,
                         nb_val_samples = 2000)

Found 8000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.




Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x7fef38746690>