# Introduction to Neural Networks, MLflow, and SHAP

**Purpose**: Introduces the concepts of neural networks and MLflow.  We will train a simple convolutional neural network on the MNIST dataset using Keras (Tensorflow backend) using [Databricks Runtime for Machine Learning](https://databricks.com/blog/2018/06/05/distributed-deep-learning-made-simple.html).  For more information, check out the [Deep Learning Fundamentals Series](https://databricks.com/tensorflow/deep-learning).

**Sources**: 
* [`keras/examples/mnist_cnn.py`](https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py)
* [`shap`](https://github.com/slundberg/shap)

In [2]:
#import shap

In [3]:
import warnings
warnings.filterwarnings("ignore")

In [4]:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K


# Use TensorFlow Backend
import tensorflow as tf
tf.set_random_seed(42) # For reproducibility

config = tf.ConfigProto()
config.gpu_options.visible_device_list = "0"
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.5
tf.Session(config=config)

# Print out Keras version
print(keras.__version__)

In [5]:
# Configure MLflow Experiment
#mlflow_experiment_id = 2102416

# Including MLflow
import mlflow
import mlflow.keras
import os
print("MLflow Version: %s" % mlflow.__version__)

## Source Data: MNIST
These set of cells are based on the TensorFlow's [MNIST for ML Beginners](https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html). 

In reference to `from keras.datasets import mnist` in the previous cell:

The purpose of this notebook is to use Keras (with TensorFlow backend) to **automate the identification of handwritten digits** from the  [MNIST Database of Handwritten Digits](http://yann.lecun.com/exdb/mnist/) database. The source of these handwritten digits is from the National Institute of Standards and Technology (NIST) Special Database 3 (Census Bureau employees) and Special Database 1 (high-school students).

<img src="https://github.com/dennyglee/databricks/blob/master/images/mnist.png?raw=true" width="300"/>

In [7]:
# -----------------------------------------------------------
# Hyperparameters
batch_size = 128
num_classes = 10
epochs = 12


# -----------------------------------------------------------
# Image Datasets

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

## What is the image?
Within this dataset, this 28px x 28px 3D structure has been flattened into an array of size 784. 

* `x_` contains the handwritten digit 
* `y_` contains the labels
* `_train` contains the 60,000 training samples
* `_test` contains the 10,000 test samples


For example, if you take the `25168`th element, the label for it is `y_train[25168,:]` indicates its the value `9`.

&nbsp;

In [9]:
# One-Hot Vector for y_train = 25168 representing the number 9 
#  The nth-digit will be represented as a vector which is 1 in the nth dimensions. 
y_train[25168,:]

`x_train[25168,:]` is the array of 784 digits numerically representing the handwritten digit number `9`.

&nbsp;

In [11]:
from __future__ import print_function

# This is the extracted array for x_train = 25168 from the training matrix
xt_25168 = x_train[25168,:]

print(xt_25168)

Let's print it as 28 x 28

&nbsp;

In [13]:
# As this is a 28 x 28 image, let's print it out this way
txt = ""
for i in range (0, 27):
   for j in range(0, 27):
      val = "%.3f" % xt_25168[i,j]
      txt += str(val).replace("[", "").replace("]", "") + ", "
   
   print(txt)
   txt = ""

You can sort of see the number **9** in there, but let's add a color-scale (the higher the number, the darker the value), you will get the following matrix:

<img src="https://dennyglee.files.wordpress.com/2018/09/nine.png" width=500/>

Here, you can access the [full-size version](https://dennyglee.files.wordpress.com/2018/09/nine.png) of this image.

## Oh where art thou GPU?

Or **[How can I run Keras on GPU?](https://keras.io/getting-started/faq/#how-can-i-run-keras-on-gpu)**: If you are running on the TensorFlow backends, your code will automatically run on GPU if any available GPU is detected.

In [16]:
# Check for any available GPUs
K.tensorflow_backend._get_available_gpus()

In [17]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

## How fast did you say?

| Processor | Duration |
| --------- | -------- |
| GPU       | 1.87min  |
| CPU       | 23.08min |

## Convolutional Neural Networks
![](https://dennyglee.files.wordpress.com/2018/09/keras-cnn-activate.png)

1. The input layer is a grey scale image of 28x28 pixels. 
2. The first convolution layer maps one grayscale image to 32 feature maps using the activation function
3. The second convolution layer maps the image to 64 feature maps using the activation function
4. The pooling layer down samples image by 2x so you have a 14x14 matrix 
5. The first dropout layer delete random neurons (regularization technique to avoid overfitting)
6. The fully connected feed-forward maps the features with 128 neurons in the hidden layer
7. The second dropout layer delete random neurons (regularization technique to avoid overfitting)
8. Apply `softmax` with 10 hidden layers to identify digit.

In [20]:
def runCNN(activation, verbose):
  # Building up our CNN
  model = Sequential()
  
  # Convolution Layer
  model.add(Conv2D(32, kernel_size=(3, 3),
                 activation=activation,
                 input_shape=input_shape)) 
  
  # Convolution layer
  model.add(Conv2D(64, (3, 3), activation=activation))
  
  # Pooling with stride (2, 2)
  model.add(MaxPooling2D(pool_size=(2, 2)))
  
  # Delete neuron randomly while training (remain 75%)
  #   Regularization technique to avoid overfitting
  model.add(Dropout(0.25))
  
  # Flatten layer 
  model.add(Flatten())
  
  # Fully connected Layer
  model.add(Dense(128, activation=activation))
  
  # Delete neuron randomly while training (remain 50%) 
  #   Regularization technique to avoid overfitting
  model.add(Dropout(0.5))
  
  # Apply Softmax
  model.add(Dense(num_classes, activation='softmax'))

  # Log MLflow
  #with mlflow.start_run(experiment_id = mlflow_experiment_id) as run:
  with mlflow.start_run() as run:
  
    # Loss function (crossentropy) and Optimizer (Adadelta)
    model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

    # Fit our model
    model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=verbose,
          validation_data=(x_test, y_test))

    # Evaluate our model
    score = model.evaluate(x_test, y_test, verbose=0)

    # Log Parameters
    mlflow.log_param("activation function", activation)
    mlflow.log_metric("test loss", score[0])
    mlflow.log_metric("test accuracy", score[1])
    
    # Log Model
    mlflow.keras.log_model(model, "model")
    
  # Return
  return score

### Using sigmoid

In [22]:
score_sigmoid = runCNN('sigmoid', 0)
print('Test loss:', score_sigmoid[0])
print('Test accuracy:', score_sigmoid[1])

### Using tanh

In [24]:
score_tanh = runCNN('tanh', 0)
print('Test loss:', score_tanh[0])
print('Test accuracy:', score_tanh[1])

### Using ReLU

In [26]:
# Building up our CNN
model = Sequential()

# Convolution Layer
model.add(Conv2D(32, kernel_size=(3, 3),
               activation='relu',
               input_shape=input_shape)) 

# Convolution layer
model.add(Conv2D(64, (3, 3), activation='relu'))

# Pooling with stride (2, 2)
model.add(MaxPooling2D(pool_size=(2, 2)))

# Delete neuron randomly while training (remain 75%)
#   Regularization technique to avoid overfitting
model.add(Dropout(0.25))

# Flatten layer 
model.add(Flatten())

# Fully connected Layer
model.add(Dense(128, activation='relu'))

# Delete neuron randomly while training (remain 50%) 
#   Regularization technique to avoid overfitting
model.add(Dropout(0.5))

# Apply Softmax
model.add(Dense(num_classes, activation='softmax'))

# Log MLflow
#with mlflow.start_run(experiment_id = mlflow_experiment_id) as run:
with mlflow.start_run() as run:

  # Loss function (crossentropy) and Optimizer (Adadelta)
  model.compile(loss=keras.losses.categorical_crossentropy,
            optimizer=keras.optimizers.Adadelta(),
            metrics=['accuracy'])

  # Fit our model
  model.fit(x_train, y_train,
        batch_size=batch_size,
        epochs=epochs,
        verbose=1,
        validation_data=(x_test, y_test))

  # Evaluate our model
  score = model.evaluate(x_test, y_test, verbose=0)

  # Log Parameters
  mlflow.log_param("activation function", 'relu')
  mlflow.log_metric("test loss", score[0])
  mlflow.log_metric("test accuracy", score[1])

  # Log Model
  mlflow.keras.log_model(model, "model")

In [27]:
print('Test loss:', score[0])
print('Test accuracy:', score[1])

If you are using the `demo` cluster, click here to [compare the three different models](https://demo.cloud.databricks.com/#mlflow/compare-runs?runs=[%222bb5815d2f564768b09228eb8156f89b%22,%22f4e44dba760c4903a01f7c0de13e059d%22,%2265348b89316d4e5bae5fc8bf8fac02f6%22]&experiment=2102416).  
![](https://pages.databricks.com/rs/094-YMS-629/images/introduction-to-neural-networks-and-mlflow.png)

In [29]:
#dbutils.notebook.exit("stop") 

## Deep learning example with DeepExplainer (TensorFlow/Keras models)

Deep SHAP is a high-speed approximation algorithm for SHAP values in deep learning models that builds on a connection with [DeepLIFT](https://arxiv.org/abs/1704.02685) described in the SHAP NIPS paper. The implementation here differs from the original [DeepLIFT](https://arxiv.org/abs/1704.02685) by using a distribution of background samples instead of a single reference value, and using Shapley equations to linearize components such as max, softmax, products, divisions, etc. Note that some of these enhancements have also been since integrated into DeepLIFT. TensorFlow models and Keras models using the TensorFlow backend are currently supported.

In [31]:
# #import shap
# import numpy as np

# # select a set of background examples to take an expectation over
# background = x_train[np.random.choice(x_train.shape[0], 100, replace=False)]

# # explain predictions of the model on three images
# e = shap.DeepExplainer(model, background)
# # ...or pass tensors directly
# # e = shap.DeepExplainer((model.layers[0].input, model.layers[-1].output), background)
# shap_values = e.shap_values(x_test[1:10])

In [32]:
# # plot the feature attributions
# shap_plot = shap.image_plot(shap_values, -x_test[1:5])
# display(shap_plot)

The plot above shows the explanations for each class on four predictions (of the four different images of 2, 1, 0, 4). Note that the explanations are ordered for the classes 0-9 going left to right along the rows, starting with the original image:

* Red pixels increase the model's output 
* Blue pixels decrease the model's output. 

The input images are shown on the left, and as nearly transparent grayscale backings behind each of the explanations. 

The sum of the SHAP values equals the difference between the expected model output (averaged over the background dataset) and the current model output. 

Some observations:
* For the 'zero' image the blank middle is important (row 3, col 2)

![shap 0](https://pages.databricks.com/rs/094-YMS-629/images/shap-0.png)


* For the 'four' image the lack of a connection on top makes it a four instead of a nine (row 4, col 6)

![shap 4](https://pages.databricks.com/rs/094-YMS-629/images/shap-4.png)

This is more apparent when looking at the "4" on row 6, col 11 where the blue pixels decrease the model output results

![shap 4.2](https://pages.databricks.com/rs/094-YMS-629/images/shap-4-2.png)

In [34]:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K


# Use TensorFlow Backend
import tensorflow as tf
tf.set_random_seed(42) # For reproducibility


# import relevant packages
import os
import warnings
import sys

import pandas as pd
import numpy as np
from itertools import cycle
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from sklearn.linear_model import lasso_path, enet_path
from sklearn import datasets

import pandas as pd
import pyspark
# import databricks.koalas as ks

# Import mlflow
import mlflow
import mlflow.sklearn

In [35]:
# Configure MLflow Tracking
mlflow.set_tracking_uri("databricks")
databricks_host = 'https://demo.cloud.databricks.com'
databricks_token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()
os.environ['DATABRICKS_HOST'] = databricks_host
os.environ['DATABRICKS_TOKEN'] = databricks_token

In [36]:
# from tensorflow.python.client import device_lib
# print(device_lib.list_local_devices())

In [37]:
def runCNN(lr, choose_optimizer, epochs, verbose, activation, kernel_size):
  # Building up our CNN
  model = Sequential()
  
  # Convolution Layer
  model.add(Conv2D(32, kernel_size=kernel_size,
                 activation=activation,
                 input_shape=input_shape)) 
  
  # Convolution layer
  model.add(Conv2D(64, (3, 3), activation=activation))
  
  # Pooling with stride (2, 2)
  model.add(MaxPooling2D(pool_size=(2, 2)))
  
  # Delete neuron randomly while training (remain 75%)
  #   Regularization technique to avoid overfitting
  model.add(Dropout(0.25))
  
  # Flatten layer 
  model.add(Flatten())
  
  # Fully connected Layer
  model.add(Dense(128, activation=activation))
  
  # Delete neuron randomly while training (remain 50%) 
  #   Regularization technique to avoid overfitting
  model.add(Dropout(0.5))
  
  # Apply Softmax
  model.add(Dense(num_classes, activation='softmax'))

  # change optimizer parameters
  if choose_optimizer == 'adadelta':
      optimizer = keras.optimizers.Adadelta(lr=lr, rho=0.95, epsilon=None, decay=0.0)
  elif choose_optimizer == 'sgd':
      optimizer = keras.optimizers.SGD(lr=lr, momentum=0.0, decay=0.0, nesterov=False)
  elif choose_optimizer == 'nag':
      optimizer = keras.optimizers.SGD(lr=lr, momentum=0.0, decay=0.0, nesterov=True)
  elif choose_optimizer == 'rmsprop':
      optimizer = keras.optimizers.RMSprop(lr=lr, rho=0.95, epsilon=None, decay=0.0)
  elif choose_optimizer == 'adam':
      optimizer = keras.optimizers.Adam(lr=lr, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
  
  # Loss function (crossentropy) and Optimizer (Adadelta)
  model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=optimizer,
              metrics=['accuracy'])

  # Fit our model
  model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=verbose,
          validation_data=(x_test, y_test))

  # Evaluate our model
  score = model.evaluate(x_test, y_test, verbose=0)
  
  
  # Start an MLflow run; the "with" keyword ensures we'll close the run even if this cell crashes
  #with mlflow.start_run() as run:
  #with mlflow.start_run(experiment_id = 4032369) as run:    
  with mlflow.start_run() as run:
    # Set tracking_URI first and then reset it back to not specifying port
    # Note, we had specified this in an earlier cell
    #mlflow.set_tracking_uri(mlflow_tracking_URI)

    # Log mlflow attributes for mlflow UI
    mlflow.log_param("lr", lr)
    mlflow.log_param("choose_optimizer", choose_optimizer)
    mlflow.log_param("activation", activation) 
    mlflow.log_metric("Test loss", score[0])
    mlflow.log_metric("Test accuracy", score[1])   
    mlflow.sklearn.log_model(lr, "model")
    
#     # Call plot_enet_descent_path
#     image = plot_enet_descent_path(X, y, l1_ratio)
    
#     # Log artifacts (output files)
#     mlflow.log_artifact("ElasticNet-paths.png")
    
    print("Inside MLflow Run with id %s" % run.info.run_uuid)
    
#     # return our RunUUID so we can use it when we try out some other APIs later in this notebook.
#     return run.info
    
  # Return
  return score

In [38]:
# set up default parameters
my_epochs = 1

In [39]:
# lr comparison analysis
# lr = 1.0, other parameters same
score = runCNN(1.0, 'adadelta', my_epochs, 0, 'relu', (3, 3))
print('Test loss:', score[0])
print('Test accuracy:', score[1])

In [40]:
# lr = 10.0, other parameters same
score = runCNN(10.0, 'adadelta', my_epochs, 0, 'relu', (3, 3))
print('Test loss:', score[0])
print('Test accuracy:', score[1])

In [41]:
# lr = 50.0, other parameters same
score = runCNN(50.0, 'adadelta', my_epochs, 0, 'relu', (3, 3))
print('Test loss:', score[0])
print('Test accuracy:', score[1])

In [42]:
# optimizer comparison analysis
score = runCNN(0.01, 'sgd', my_epochs, 0, 'relu', (3, 3))
print('SGD, Test loss:', score[0])
print('SGD, Test accuracy:', score[1])

score = runCNN(0.01, 'nag', my_epochs, 0, 'relu', (3, 3))
print('NAG, Test loss:', score[0])
print('NAG, Test accuracy:', score[1])

score = runCNN(0.001, 'rmsprop', my_epochs, 0, 'relu', (3, 3))
print('RMSProp, Test loss:', score[0])
print('RMSProp, Test accuracy:', score[1])

score = runCNN(0.001, 'adam', my_epochs, 0, 'relu', (3, 3))
print('ADAM, Test loss:', score[0])
print('ADAM, Test accuracy:', score[1])

score = runCNN(1.0, 'adadelta', my_epochs, 0, 'relu', (3, 3))
print('ADAMDelta, Test loss:', score[0])
print('ADAMDelta, Test accuracy:', score[1])

In [43]:
# activision function comparison analysis
# activation = 'sigmoid', other parameters same
score = runCNN(1.0, 'adadelta', my_epochs, 0, 'sigmoid', (3, 3))
print('Test loss:', score[0])
print('Test accuracy:', score[1])

In [44]:
# activation = 'tanh', other parameters same
score = runCNN(1.0, 'adadelta', my_epochs, 0, 'tanh', (3, 3))
print('Test loss:', score[0])
print('Test accuracy:', score[1])

In [45]:
# activation = 'relu', other parameters same
score = runCNN(1.0, 'adadelta', my_epochs, 0, 'relu', (3, 3))
print('Test loss:', score[0])
print('Test accuracy:', score[1])

In [46]:
# activision function comparison analysis
# kernel_size = '(3, 3)', other parameters same
score = runCNN(1.0, 'adadelta', my_epochs, 0, 'relu', (3, 3))
print('Test loss:', score[0])
print('Test accuracy:', score[1])

In [47]:
# kernel_size = '(4, 4)', other parameters same
score = runCNN(1.0, 'adadelta', my_epochs, 0, 'relu', (4, 4))
print('Test loss:', score[0])
print('Test accuracy:', score[1])

In [48]:
# kernel_size = '(5, 5)', other parameters same
score = runCNN(1.0, 'adadelta', my_epochs, 0, 'relu', (5, 5))
print('Test loss:', score[0])
print('Test accuracy:', score[1])

In [49]:
# import mlflow.pyfunc
# model_name = "Housing Example - YZ"
# model_production_uri = "models:/{model_name}/production".format(model_name=model_name)
# print("Loading registered model version from URI: '{model_uri}'".format(model_uri=model_production_uri))
# model_production = mlflow.pyfunc.load_model(model_production_uri)

# # Loading the model from Model Registry
# print (model_production_uri)
# model_production.coef_