## Deep Learning using Keras

### Package and Version

Keras and tensorflow have been installed in DSX
Let's get started by importing the libraries we'll need, and check their version as follows:

In [1]:
import keras
import tensorflow as tf
import sys
import sklearn as sk


#Check version
print("Keras Version: {}".format(keras.__version__))
print("Tensor Flow Version: {}".format(tf.__version__))
print("Python {}".format(sys.version))


Using TensorFlow backend.


Keras Version: 2.0.5
Tensor Flow Version: 1.2.1
Python 3.5.2 |Anaconda 4.1.1 (64-bit)| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]


### Useful Functions for Data Preprocessing

Several useful functions for data preprocessing, which is created by Dr.Jeff Heaton(https://www.linkedin.com/in/jeffheaton/) for his deep learning class in WashU. You can find it on Jeff's Github https://github.com/jeffheaton/t81_558_deep_learning/blob/master/jeffs_helpful.ipynb 

In [2]:
import pandas as pd
from sklearn import preprocessing

# Encode text values to dummy variables(i.e. [1,0,0],[0,1,0],[0,0,1] for red,green,blue)
def encode_text_dummy(df, name):
    dummies = pd.get_dummies(df[name])
    for x in dummies.columns:
        dummy_name = "{}-{}".format(name, x)
        df[dummy_name] = dummies[x]
    df.drop(name, axis=1, inplace=True)
    
# Encode text values to indexes(i.e. [1],[2],[3] for red,green,blue).
def encode_text_index(df, name):
    le = preprocessing.LabelEncoder()
    df[name] = le.fit_transform(df[name])
    return le.classes_

# Convert all missing values in the specified column to the median
def missing_median(df, name):
    med = df[name].median()
    df[name] = df[name].fillna(med)
    
# Convert a Pandas dataframe to the x,y inputs that TensorFlow needs
def to_xy(df, target):
    result = []
    for x in df.columns:
        if x != target:
            result.append(x)
    # find out the type of the target column.  Is it really this hard? :(
    target_type = df[target].dtypes
    target_type = target_type[0] if hasattr(target_type, '__iter__') else target_type
    # Encode to int for classification, float otherwise. TensorFlow likes 32 bits.
    if target_type in (np.int64, np.int32):
        # Classification
        dummies = pd.get_dummies(df[target])
        return df.as_matrix(result).astype(np.float32), dummies.as_matrix().astype(np.float32)
    else:
        # Regression
        return df.as_matrix(result).astype(np.float32), df.as_matrix([target]).astype(np.float32)
    
# Encode a numeric column as zscores
def encode_numeric_zscore(df, name, mean=None, sd=None):
    if mean is None:
        mean = df[name].mean()

    if sd is None:
        sd = df[name].std()

    df[name] = (df[name] - mean) / sd

### Classification Model using Keras

Usually, most tutorial will use two famous bench mark Auto-MPG dataset for regression and iris set for classification. Whereas, I will use Auto-MPG to do classification and iris to do regression in this demo. 

Auto-MPG: https://archive.ics.uci.edu/ml/datasets/auto+mpg
iris: https://archive.ics.uci.edu/ml/datasets/iris

First, let's load mpg dataset. Our intention is to classify 'cylinders' using other variables.  

In [3]:
from keras.models import Sequential
from keras.layers.core import Dense, Activation
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics


url="https://raw.githubusercontent.com/lcx813/data/master/auto-mpg.csv"
df=pd.read_csv(io.StringIO(requests.get(url).content.decode('utf-8')),na_values=['NA','?'])

df.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,1,ford torino


In [4]:
# Data preprocessing and create feature vector
missing_median(df, 'horsepower')

tmp = df['name']
df.drop('name',1,inplace=True)

encode_numeric_zscore(df, 'mpg')
encode_numeric_zscore(df, 'horsepower')
encode_numeric_zscore(df, 'weight')
encode_numeric_zscore(df, 'displacement')
encode_numeric_zscore(df, 'acceleration')

encode_text_dummy(df, 'origin')

cylinders = encode_text_index(df, 'cylinders')
num_classes = len(cylinders)

x,y = to_xy(df,'cylinders')

from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=45)

Using Keras to build deep neural networks. For the Keras details: https://keras.io/

In [22]:
# Fit neural network
model = Sequential()
model.add(Dense(10, input_dim=x.shape[1], kernel_initializer='normal', activation='relu'))
model.add(Dense(y.shape[1],activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])
model.fit(x,y,verbose=1,epochs=100)

# Testing
loss, accuracy = model.evaluate(x_test, y_test, verbose=0)
print("Accuracy = {:.2f}".format(accuracy))


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

### Regression Model using Keras

Then, let's load iris dataset. Our intention is to predict 'petal_w' using other variables. 

In [10]:
from keras.models import Sequential
from keras.layers.core import Dense, Activation
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics


url="https://raw.githubusercontent.com/lcx813/data/master/iris.csv"
df=pd.read_csv(io.StringIO(requests.get(url).content.decode('utf-8')),na_values=['NA','?'])

df.head()

Unnamed: 0,sepal_l,sepal_w,petal_l,petal_w,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [11]:
encode_text_dummy(df, 'species')
encode_numeric_zscore(df, 'sepal_l')
encode_numeric_zscore(df, 'sepal_w')
encode_numeric_zscore(df, 'petal_l')

x,y = to_xy(df,['petal_w'])
train_X, test_X, train_y, test_y = train_test_split(x, y, train_size=0.8, random_state=0)

In [14]:
model = Sequential()
model.add(Dense(10, input_dim=x.shape[1], activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(train_X,train_y,verbose=2,epochs=100)

Epoch 1/100
0s - loss: 2.3070
Epoch 2/100
0s - loss: 2.2546
Epoch 3/100
0s - loss: 2.2028
Epoch 4/100
0s - loss: 2.1495
Epoch 5/100
0s - loss: 2.0971
Epoch 6/100
0s - loss: 2.0467
Epoch 7/100
0s - loss: 1.9955
Epoch 8/100
0s - loss: 1.9453
Epoch 9/100
0s - loss: 1.8941
Epoch 10/100
0s - loss: 1.8455
Epoch 11/100
0s - loss: 1.7944
Epoch 12/100
0s - loss: 1.7435
Epoch 13/100
0s - loss: 1.6904
Epoch 14/100
0s - loss: 1.6394
Epoch 15/100
0s - loss: 1.5847
Epoch 16/100
0s - loss: 1.5307
Epoch 17/100
0s - loss: 1.4745
Epoch 18/100
0s - loss: 1.4200
Epoch 19/100
0s - loss: 1.3650
Epoch 20/100
0s - loss: 1.3075
Epoch 21/100
0s - loss: 1.2489
Epoch 22/100
0s - loss: 1.1916
Epoch 23/100
0s - loss: 1.1299
Epoch 24/100
0s - loss: 1.0750
Epoch 25/100
0s - loss: 1.0171
Epoch 26/100
0s - loss: 0.9584
Epoch 27/100
0s - loss: 0.9012
Epoch 28/100
0s - loss: 0.8454
Epoch 29/100
0s - loss: 0.7890
Epoch 30/100
0s - loss: 0.7338
Epoch 31/100
0s - loss: 0.6821
Epoch 32/100
0s - loss: 0.6327
Epoch 33/100
0s -

<keras.callbacks.History at 0x7f59006ad550>

In [15]:
pred = model.predict(test_X)
# Measure RMSE error.  RMSE is common for regression.
score = np.sqrt(metrics.mean_squared_error(pred,test_y))
print("Final score (RMSE): {}".format(score))


Final score (RMSE): 0.18683761358261108


### Deeper Networks

Still use mpg dataset as an example:

In [16]:
from keras.models import Sequential
from keras.layers.core import Dense, Activation
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics


url="https://raw.githubusercontent.com/lcx813/data/master/auto-mpg.csv"
df=pd.read_csv(io.StringIO(requests.get(url).content.decode('utf-8')),na_values=['NA','?'])

# Data preprocessing and create feature vector
missing_median(df, 'horsepower')

tmp = df['name']
df.drop('name',1,inplace=True)

encode_numeric_zscore(df, 'mpg')
encode_numeric_zscore(df, 'horsepower')
encode_numeric_zscore(df, 'weight')
encode_numeric_zscore(df, 'displacement')
encode_numeric_zscore(df, 'acceleration')

encode_text_dummy(df, 'origin')

cylinders = encode_text_index(df, 'cylinders')
num_classes = len(cylinders)

x,y = to_xy(df,'cylinders')

from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=45)

In [23]:
# Fit neural network
model = Sequential()
model.add(Dense(25, input_dim=x.shape[1], kernel_initializer='normal', activation='relu'))
model.add(Dense(15, kernel_initializer='normal', activation='relu'))
model.add(Dense(10, kernel_initializer='normal', activation='relu'))
model.add(Dense(y.shape[1],activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])
model.fit(x,y,verbose=1,epochs=100)

# Testing
loss, accuracy = model.evaluate(x_test, y_test, verbose=0)
print("Accuracy = {:.2f}".format(accuracy))


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78