# How to Score Tensorflow Models in SAS Event Stream Processing

## 0. Setting the Python Environment 

To run Tensorflow models in SAS Event Stream Processing, you must have Tensorflow installed on the same system where you run the ESP server. To manage Tensorflow dependencies, it is recommended to use Anaconda or Miniconda to perform Python environment setting and package management. 

<b>Use the following steps to set up the environment. All steps should be run on the system where you run the ESP server.</b>

1. Download Miniconda (https://docs.conda.io/en/latest/miniconda.html) or Anaconda (https://www.anaconda.com/distribution/) to the machine on which  the ESP server will be running. Then run the installer.

  At the end of the installation, choose "no" when you see the question "Do you wish the installer to initialize Miniconda3 by running conda init? [yes|no]".

  The following steps assume that Miniconda is installed at <code>~/miniconda3</code>.

*Note: If you choose to use Miniconda, you must install several additional packages to properly run this notebook. These packages include:* 
    - esp
    - Image
    - ws4py
    - pandas
    - numpy

2. Create a python environment (for example, "tf") and activate it for later use.
```bash
   ~/miniconda3/bin/conda create -n tf python=3.4.1
   source ~/miniconda3/bin/activate tf
```

3. Install Tensorflow using pip.
```bash
   pip install --upgrade tensorflow
```

4. Set environment variables
```bash
   ## Environment variable for Python location
   export PYTHONHOME=~/miniconda3/envs/tf

   ## Environment variable for libraries
   export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PYTHONHOME/bin
   export PYTHONPATH=$PYTHONHOME/bin/python3.4:$PYTHONHOME/bin/python3.4/site-packages:$PYTHONHOME/bin/python3.4/lib-dynload:$PYTHONHOME/bin/python3.4/plat-linux
   
   ## Environment variables for MAS to run python     
   export MAS_PYPATH=$PYTHONHOME/bin/python 
   export MAS_M2PATH=/opt/sas/viya/home/SASFoundation/misc/embscoreeng/mas2py.py
```

In [None]:
import sys
sys.path.append("<pathto>/python-esppy") # This is unique for each user

## 1. Loading Data

Begin by importing the mnist_input_data training data. This data contains 60,000 training examples and 10,000 examples of handwritten digits. 

You split this data into two distinct data sets: test_images and test_labels. Later, you use these two data sets to build a model that analyzes the image data. 

In [None]:
import mnist_input_data
mnist = mnist_input_data.read_data_sets("<pathto>/MNIST_data/", one_hot=True)
test_images = mnist.test.images
test_labels = mnist.test.labels

## 2. Creating Demo Project

To create a SAS Event Stream Processing project, you first need to import the esppy library. 

Ensure that you have the latest version of SAS Event Stream Processing on your machine by running <code>git pull</code> in the ESP directory on your system. (The earliest version that you can use is 6.1.)

Run <code>esppy.ESP</code> to establish a conncetion with your ESP server. You must specify a host and port to successfully establish a server connection.

In [None]:
import esppy

Run <code>esppy.ESP</code> to establish a connection with your ESP server. You must specify a host and port to successfully establish a server connection.

In [None]:
esp = esppy.ESP('<server>:<port>')

Create a SAS Event Stream Processing project by running <code>esp.create_project(*project*)</code>. Here, you specify *esp_mnist* as your project and name it proj. 

In [None]:
proj = esp.create_project('esp_mnist')

Here, you create a souce window and name it *TF_SRC*. You use a schema to read your data into *TF_src*.

In [None]:
TF_src = esp.SourceWindow(schema=('id*:int64', 'input:array(dbl)','digit:string'))
proj.windows['w_data1'] = TF_src

You must point to where you stored your Tensorflow model file. To do this, you replace <code>'pathto'</code> with the directory path leading to your model file. You create a scoring window to score your Tensorflow model using <code>esp.CalculateWindow</code> and read in the model file using a schema. Next, you add the model file to the window using <code>add_model_info</code>. You must specify parameters such as your model file and data source.

For more information on creating Calculate Windows, see [Creating and Using Windows](https://go.documentation.sas.com/?cdcId=espcdc&cdcVersion=6.1&docsetId=espcreatewindows&docsetTarget=n1n1erunro8yqgn16fiqs1tn17fn.htm&locale=en).

In [None]:
TF_model_file = '<pathto>/TF_model.meta'
TF_win = esp.CalculateWindow.TensorflowHelper(schema=('id*:int64', 'output:int64','digit:string'))
TF_win.add_model_info(model_name='TF_NN', model_file=TF_model_file, source='w_data1', 
                      input_op='x', score_op='score_op', input_name='input', output_name='output')

An edge is used to connect two windows. In this case, you use an edge with the role of data to connect the *TF_src* data window to *TF_win*. For more information on using edges, see [Edge Roles](https://go.documentation.sas.com/?cdcId=espcdc&cdcVersion=6.1&docsetId=espan&docsetTarget=p0v2sood1298h8n10tvox93xh2tb.htm).

In [None]:
proj.windows["w_TF"] = TF_win

TF_src.add_target(TF_win, role='data')

Now create a schema. A schema is used to ensure that the data types you want processed in your model match the type of data contained in the data set that you loaded from *mnist_input_data*.

In [None]:
schema = ['id*:int64','digit:string', 'I_digit:string'] + ['P_{}:double'.format(i) for i in range(10)]

Create a compute window.

In [None]:
TF_comp = esp.ComputeWindow("w_TF_comp", schema=schema)
TF_comp.add_field_expression("digit")
TF_comp.add_field_expression("output")
for i in range(10):
    TF_comp.add_field_expression('''
if output=='{}' then
    return 0.91
else
    return 0.01
    '''.format(i))
proj.windows['w_TF_comp'] = TF_comp
TF_win.add_target(TF_comp, role='data')

Create a Calculate window that runs calculations to determine the models fit statistics, commonly referred to as FitStat. You use <code>esp.calculate.FitStat</code> and name this calculate window *TF_fitstat*. You must specify several parameters such as, <code>schema</code>, <code>classLabels</code> and <code>windowLength</code>. You also must map the inputs and outputs. For more information on Fitstat windows, see [Computing Fit Statistics for Scored Results](https://go.documentation.sas.com/?cdcId=espcdc&cdcVersion=6.1&docsetId=espan&docsetTarget=p1k5j3rok1x59on15i884xa66ajq.htm&locale=e).

In [None]:
TF_fitstat = esp.calculate.FitStat(schema=('id*:int64','mceOut:double'),
                                      classLabels='0,1,2,3,4,5,6,7,8,9',
                                      windowLength=100)
inputs = tuple(['P_{}:double'.format(i) for i in range(10)])
TF_fitstat.set_inputs(inputs=inputs, 
                         response=('digit:string'))
TF_fitstat.set_outputs(mceOut='mceOut:double')

Here, you use an edge to connect the *TF_comp* window to *TF_fitstat* with the role of data. 

In [None]:
proj.windows['w_TF_fitstat'] = TF_fitstat

TF_comp.add_target(TF_fitstat, role='data')

Here you print your XML file to view. This is an optional step.

In [None]:
print(proj.to_xml(pretty=True))

In [None]:
proj

## 3. Loading the Project into ESP

Load your project to the ESP server using <code>esp.load_project</code>.

In [None]:
esp.load_project(proj)

## 4. Publishing Data and Subscribing Results

To view results, you must subscribe to the windows and dataframes you have created. 

In [None]:
TF_src.subscribe()
TF_win.subscribe()

Import numpy and give it the abbreviation np. 

In [None]:
import numpy as np

Create a string representation of an array and apply that array to the test images in the mnist data.

In [None]:
def array2str(arr):
    x_arrstr = np.char.mod('%f', arr)
    return '[' +";".join(x_arrstr) + ']'

pixel_array = np.apply_along_axis(array2str, 1, test_images)

Import time in order to create a publishing thread.

In [None]:
import time

Here you create a publisher window that will periodically (every 0.04 seconds) stream data to your esp server. 

In [None]:
def publish_thread2(window):
    pub = window.create_publisher(blocksize=1, rate=0, pause=0,
                                  dateformat='%Y%m%dT%H:%M:%S.%f', opcode='insert', format='csv')

    labels = np.argmax(test_labels,axis=1)
    for i in range(len(pixel_array)):
        strToSend = 'i,n,{},'.format(i)+pixel_array[i]+',{}\n'.format(labels[i])
        pub.send(strToSend)
        time.sleep(0.04)

Start a thread for your data.

In [None]:
from threading import Thread
thread = Thread(target = publish_thread2, args = (TF_src, ))
thread.start()

You can use the <code>.tail</code> argument to print rows of the *TF_src* and *TF_win* dataframes that you have created to your screen. By default, <code>.tail</code> prints the last 5 rows.

In [None]:
TF_src.tail()

In [None]:
TF_win.tail()

## 5. Displaying Results

Use the matplotlib.pyplot library to print images of the hand drawn digits from the mnist data set to the screen. To use this library, you must first import it.

In [None]:
import matplotlib.pyplot as plt

The following block of code creates two images working from the bottom of the dataframe that you created earlier. The first image shows a correct prediction from your model, while the second image shows an incorrect predicition. There are several pieces of this block of code that are important to understand.

First, <code>%matplotlib inline</code> allows for images to be displayed in the Jupyter Notebook. This line must be included to view the two graphs you create.

Second, you use <code>fig.add_subplot</code> to describe how you would like your plots to be arranged and what index you want to specify. For example, <code>ax1 = fig.add_subplot(121)</code> dictates the there are 1 row and 2 columns for the two plots you are creating, while the first graph is given an index of 1.

Third, you create two conditional if statements that separate the correct from image identifications from the incorrect identifications. 

In [None]:
%matplotlib inline

fig = plt.figure(figsize=(7,3), dpi=80)
plt.tight_layout()

ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)

fig.canvas.draw()

n = len(TF_win)
tmp = TF_win[:n]

index = tmp[tmp['output'] == (tmp['digit'].astype(np.int64))].tail(1).index.values
correct_id = index[0] if len(index) > 0 else None

index = tmp[tmp['output'] != (tmp['digit'].astype(np.int64))].tail(1).index.values
incorrect_id = index[0] if len(index) > 0 else None

if correct_id is not None:
    ax1.clear() 
    ax1.imshow(test_images[correct_id].reshape(28,28), cmap='gray', interpolation='nearest')
    ax1.set_title("Tensorflow Correct Prediction: {}".format(TF_win.loc[correct_id][0]), fontsize=10)
        
if incorrect_id is not None:
    ax2.clear() 
    ax2.imshow(test_images[incorrect_id].reshape(28,28), cmap='gray', interpolation='nearest')
    ax2.set_title("Tensorflow Incorrect Prediction: {}".format(TF_win.loc[incorrect_id][0]), fontsize=10)

## 6. Cleanup

Finally, it is a good practice to clean up your work space. Here, you unsubscribe to *T* and *JMP_src* and delete the project and shutdown your esp server.

In [None]:
TF_win.unsubscribe()
TF_src.unsubscribe()

esp.delete_project("esp_mnist")

After you finish running your esp project, you might wish to shutdown your ESP server. Uncomment the code below and run <code>esp.shutdown()</code> to shutdown your server.

In [None]:
#esp.shutdown()