# Deep Learning with Python

## Overview

- [Python Libraries](#Python-Libraries)  
    - [Introduction to Theano](#Introduction-to-Theano)
    - [Introduction to TensorFlow](#Introduction-to-TensorFlow)
- [Keras](#Keras)
- [ARTIFICIAL NEURAL NETWORK using Keras](#ARTIFICIAL-NEURAL-NETWORK-using-Keras)
- [Evaluating the ANN model with k-Fold Cross Validation](#Evaluating-the-ANN-model-with-k-Fold-Cross-Validation)
- [Tuning the ANN model with GridSearchCV](#Tuning-the-ANN-model-with-GridSearchCV)

## Python Libraries 

There are several python libraries to build deep-learning models. The main two libraries to build deep artificial neural networks are: Theano and TensoFlow

### Introduction to Theano

[Theano](http://deeplearning.net/software/theano/) is a Python library for **fast numerical computation** to aid in the development of deep learning models. At it’s heart Theano is a compiler for mathematical expressions in Python. It knows how to take your structures and turn them into very efficient code that uses NumPy and efficient native libraries to run as fast as possible on CPUs or GPUs.  

The actual syntax of Theano expressions is symbolic, which can be off-putting to beginners used to normal software development. Specifically, expression are defined in the abstract sense, compiled and later actually used to make calculations.

### Introduction to TensorFlow

[TensorFlow](https://www.tensorflow.org/) is a Python library for **fast numerical computing** created and released by Google. Like Theano, TensorFlow is intended to be used to develop deep learning models. With the backing of Google, perhaps used in some of it’s production systems and used by the Google DeepMind research group, it is a platform that we cannot ignore. Unlike Theano, TensorFlow does have more of a production focus with a capability to run on CPUs, GPUs and even very large clusters.

## Keras

A difficulty of both Theano and TensorFlow is that it can take a lot of code to create even very simple neural network models. These libraries were designed primarily as a platform for research and development more than for the practical concerns of applied deep learning. The [Keras](https://keras.io/) library addresses these concerns by providing a wrapper for both Theano and TensorFlow. It provides a clean and simple API that allows you to define and evaluate deep learning models in just a few lines of code.  

Because of the ease of use and because it leverages the power of Theano and TensorFlow, Keras is quickly becoming the go-to library for applied deep learning. The focus of Keras is the concept of a model. The life-cycle of a model can be summarized as follows:
1. Define your model. Create a **Sequential model** and **add configured layers**.
2. Compile your model. Specify **loss function** and **optimizers** and call the compile() function on the model.
3. Fit your model. Train the model on a sample of data by calling the **fit()** function on the model.
4. Make predictions. Use the model to generate predictions on new data by calling functions such as **evaluate()** or **predict()** on the model.

## ARTIFICIAL NEURAL NETWORK using Keras

In [1]:
# load libraries and set plot parameters
import numpy as np
import pandas as pd
# import PrettyTable as pt

import matplotlib.pyplot as plt
%matplotlib inline

# plots configuration
# plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = 10, 6
plt.rcParams['axes.labelsize'] = 11
plt.rcParams['axes.titlesize'] = 11
plt.rcParams['legend.fontsize'] = 10

In [2]:
# load keras classes
from keras.models import Sequential
from keras.layers import Dense

Using Theano backend.


In [3]:
# loading the dataset
from sklearn.datasets import load_iris

dataset = load_iris()
print(dataset['DESCR'])

Iris Plants Database

Notes
-----
Data Set Characteristics:
    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20  0.76     0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988

This is a copy of UCI ML iris d

In [4]:
df = pd.DataFrame(dataset.data, columns=dataset.feature_names)
df['class'] = pd.Series(dataset.target, name='class')
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),class
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [5]:
df.describe()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),class
count,150.0,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667,1.0
std,0.828066,0.433594,1.76442,0.763161,0.819232
min,4.3,2.0,1.0,0.1,0.0
25%,5.1,2.8,1.6,0.3,0.0
50%,5.8,3.0,4.35,1.3,1.0
75%,6.4,3.3,5.1,1.8,2.0
max,7.9,4.4,6.9,2.5,2.0


In [6]:
X = df.drop('class', axis=1).values
y = df['class'].values

Since the output variable contains three different values we are dealing with a **multi-class classification problem**.

In [7]:
df['class'].unique()

array([0, 1, 2])

In [8]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=7)

When modeling multi-class classification problems using neural networks, it is a good practice to reshape the output variable from a vector that contains values for each class value to be a matrix with a boolean for each class value and whether or not a given instance has that class value or not.  

This is called **one hot encoding**

In [9]:
from sklearn.preprocessing import LabelEncoder
labelEncoder = LabelEncoder()
y_encoded = labelEncoder.fit_transform(y_train)

from keras.utils import np_utils
y_train = np_utils.to_categorical(y_encoded)

In [10]:
y_train[0:10,:]

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.]])

In [11]:
# Feature Scaling is necessary when building an Artificial Neural Network
from sklearn.preprocessing import StandardScaler
sd = StandardScaler(with_mean=True, with_std=True)
sd.fit(X_train)

X_train_std = sd.transform(X_train)
X_test_std = sd.transform(X_test)

In [12]:
# Building the Artificial Neural Network
ann_classifier = Sequential() 
ann_classifier.add(Dense(units=6, input_dim=4, kernel_initializer='uniform', activation='relu')) # First hidden layer
ann_classifier.add(Dense(units=4, kernel_initializer='uniform', activation='relu')) # Second hidden layer
ann_classifier.add(Dense(units=3, kernel_initializer='uniform', activation='sigmoid')) # Output layer
ann_classifier.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [13]:
# Training the model
ann_classifier.fit(x=X_train_std, y=y_train, batch_size=10, epochs=1000)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x1124ae978>

In [14]:
ann_classifier.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 6)                 30        
_________________________________________________________________
dense_2 (Dense)              (None, 4)                 28        
_________________________________________________________________
dense_3 (Dense)              (None, 3)                 15        
Total params: 73
Trainable params: 73
Non-trainable params: 0
_________________________________________________________________


In [15]:
# Testing the model
y_pred = ann_classifier.predict_classes(X_test_std)

from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print('\nAccuracy: {0}'.format(accuracy))
print('Number of milabeled points: {0}'.format((y_test!=y_pred).sum()))

Accuracy: 0.6222222222222222
Number of milabeled points: 17


## Evaluating the ANN model with k-Fold Cross Validation

In [16]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from keras.wrappers.scikit_learn import KerasClassifier

def build_classifier():
    # create model
    model = Sequential()
    model.add(Dense(units=6, input_dim=4, kernel_initializer='uniform', activation='relu')) # First hidden layer
    model.add(Dense(units=4, kernel_initializer='uniform', activation='relu')) # Second hidden layer
    model.add(Dense(units=3, kernel_initializer='uniform', activation='sigmoid')) # Output layer
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

classifier = KerasClassifier(build_fn=build_classifier, batch_size=5, epochs=200, verbose=0)

kfold = KFold(n_splits=10, shuffle=True, random_state=7)
results = cross_val_score(estimator=classifier, X=X_train_std, y=y_train, cv=kfold)
print("Accuracy: {0:.2f} ({1:.2f})".format(results.mean()*100, results.std()*100))

Accuracy: 75.45 (22.62)


## Tuning the ANN model with GridSearchCV

In [17]:
from sklearn.model_selection import GridSearchCV
import time

def build_ann_classifier(optimizer, debug=False):
    # create model
    model = Sequential()
    model.add(Dense(units=6, input_dim=4, kernel_initializer='uniform', activation='relu')) # First hidden layer
    model.add(Dense(units=4, kernel_initializer='uniform', activation='relu')) # Second hidden layer
    model.add(Dense(units=3, kernel_initializer='uniform', activation='sigmoid')) # Output layer
    model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

classifier = KerasClassifier(build_fn=build_ann_classifier, verbose=0)
parameters = {'batch_size':[5, 10, 20],
              'epochs':[100, 200, 500],
              'optimizer':['adam', 'rmsprop']}

grid_search = GridSearchCV(estimator= classifier,
                           param_grid=parameters,
                           cv=10)

t0 = time.time()
print('Training the model using grid seach for tuning the hyperparemetes and 10 fold cross validation')
grid_search.fit(X_train, y_train)
print('Training time: {0}'.format(time.time()-t0))

best_parameters = grid_search.best_params_ 
best_accuracy = grid_search.best_score_

print('\nBest parameters: {0}'.format(best_parameters))
print('Best accuracy: {0}'.format(best_accuracy))

Training the model using grid seach for tuning the hyperparemetes and 10 fold cross validation
Training time: 738.427412033081

Best parameters: {'batch_size': 5, 'optimizer': 'adam', 'epochs': 200}
Best accuracy: 0.9809523820877075


In [18]:
# Building the best model according to the results of the GridSearchCV
best_model = Sequential()
best_model.add(Dense(units=6, input_dim=4, kernel_initializer='uniform', activation='relu')) # First hidden layer
best_model.add(Dense(units=4, kernel_initializer='uniform', activation='relu')) # Second hidden layer
best_model.add(Dense(units=3, kernel_initializer='uniform', activation='sigmoid')) # Output layer
best_model.compile(loss='categorical_crossentropy', optimizer=best_parameters['optimizer'], metrics=['accuracy'])

best_model.fit(x=X_train_std, y=y_train, batch_size=best_parameters['batch_size'], epochs=best_parameters['epochs'])

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

Epoch 177/200
Epoch 178/200
Epoch 179/200
Epoch 180/200
Epoch 181/200
Epoch 182/200
Epoch 183/200
Epoch 184/200
Epoch 185/200
Epoch 186/200
Epoch 187/200
Epoch 188/200
Epoch 189/200
Epoch 190/200
Epoch 191/200
Epoch 192/200
Epoch 193/200
Epoch 194/200
Epoch 195/200
Epoch 196/200
Epoch 197/200
Epoch 198/200
Epoch 199/200
Epoch 200/200


<keras.callbacks.History at 0x115d41e80>

In [19]:
# Testing the model
y_pred = best_model.predict_classes(X_test_std)

from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print('\nAccuracy: {0}'.format(accuracy))
print('Number of milabeled points: {0}'.format((y_test!=y_pred).sum()))

Accuracy: 0.9333333333333333
Number of milabeled points: 3
