# MNIST Model EDI Deployment Tutorial
This tutorial demonstrates:
1. Convert a MNIST model inference script to a EDI package for deployment
2. Deploy the MNIST model
3. Start the model
4. Inference with EDI 

Import the WMLA packages

In [1]:
import ibm_wmla,ibm_wmla_client
import tensorflow as tf 

## 1. Convert a MNIST script to EDI package 
This section introduces how to convert a MNIST python script to a package that is uploadable to EDI. 

The package:
```
kernel.py
model.json
readme.md
model.h5
```

An example package can be found in [`wmla-python-client/examples/mnist_example`](https://github.ibm.com/anz-tech-garage/wmla-python-client/tree/master/examples/mnist_example/mnist)

When we have a trained a MNIST model with Keras, we usually save the model as a `.h5` file. WMLA can take the Keras model and load it for inference.

Without WMLA, a data scientist would write the inference code like:

```

# load the model
mnist_model = tf.keras.models.load_model('mnist_model.h5')

# make test images
img_shape = (28, 28, 1)
x_test = np.random.random_sample((1,) + img_shape)

# use the model to predict
results = mnist_model.predict(x_test)
```

To perform these actions in WMLA EDI, separete them into simply separate them into `on_kernel_start` and `on_task_invoke`.


First we define a file `kernel.py` with the below template:

```
#!/usr/bin/env python

import redhareapiversion
from redhareapi import Kernel

import numpy as np
import os, json, base64, time
import tensorflow as tf

class TestKernel(Kernel):
    def on_kernel_start(self, kernel_context):
        try:
            Kernel.log_info("kernel input: " + kernel_context.get_model_description())
            
            model_desc = json.loads(kernel_context.get_model_description())
            model_path = model_desc['model_path']
            if model_path == '':
                model_path = os.getcwd()
            # os.chdir(model_path)
            Kernel.log_info("currect dir" + os.getcwd())

            model_path = model_path + '/' + model_desc['weight_path']

            ### INSERT LOAD MODEL CODE ###


        except Exception as e:
            Kernel.log_error(str(e))
            
    def on_task_invoke(self, task_context):
        try:
            start = time.time()
            Kernel.log_info(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
            Kernel.log_info('on_task_invoke')
            while task_context != None:
                input_data = json.loads(task_context.get_input_data())
                img_id = input_data['id']
                img_data = input_data['data']

                ### INSERT PREDICT CODE ###


                task_context.set_output_data(json.dumps(output_data))
                task_context = task_context.next()
            end = time.time()
            Kernel.log_info("exit on_task_invoke, using time %.2f" % (end-start))
            Kernel.log_info("<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<")
        except Exception as e:
            task_context.set_output_data(str(e))
            Kernel.log_error(str(e))

if __name__ == '__main__':
    ppkernel = TestKernel()
    ppkernel.run()
```


Now we can insert the load model and prefict code into the functions.

In [2]:
%%writefile kernel.py

#!/usr/bin/env python

import redhareapiversion
from redhareapi import Kernel

import numpy as np
import os, json, base64, time
import tensorflow as tf

class TestKernel(Kernel):
    def on_kernel_start(self, kernel_context):
        try:
            Kernel.log_info("kernel input: " + kernel_context.get_model_description())
            
            model_desc = json.loads(kernel_context.get_model_description())
            model_path = model_desc['model_path']
            if model_path == '':
                model_path = os.getcwd()
            # os.chdir(model_path)
            Kernel.log_info("currect dir" + os.getcwd())

            model_path = model_path + '/' + model_desc['weight_path']

            # LOAD MODEL
            self.model = tf.keras.models.load_model(model_path)


        except Exception as e:
            Kernel.log_error(str(e))
            
    def on_task_invoke(self, task_context):
        try:
            start = time.time()
            Kernel.log_info(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
            Kernel.log_info('on_task_invoke')
            while task_context != None:
                input_data = json.loads(task_context.get_input_data())
                img_id = input_data['id']
                img_data = input_data['data']

                img_data = np.asarray(img_data).astype('float32')
                
                # MAKE PREDICTION
                y_keras = self.model.predict(img_data)
                
                output_data = {}
                output_data['key'] = img_id
                output_data['data'] = y_keras.tolist()

                task_context.set_output_data(json.dumps(output_data))
                task_context = task_context.next()
            end = time.time()
            Kernel.log_info("exit on_task_invoke, using time %.2f" % (end-start))
            Kernel.log_info("<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<")
        except Exception as e:
            task_context.set_output_data(str(e))
            Kernel.log_error(str(e))

if __name__ == '__main__':
    ppkernel = TestKernel()
    ppkernel.run()


Writing kernel.py


We need a `model.json` to define the model metadata and runtime.

In [3]:
%%writefile model.json
{
    "name" : "mnisttest",
    "tag" : "test",
    "weight_path" : "mnist_model.h5",
    "runtime" : "dlipy3",
    "kernel_path" : "kernel.py",
    "schema_version" : "1"
}


Writing model.json


We will also need a `readme.md` to tell users what the model does and what's the test data format:

In [4]:
%%writefile README.md

# README of MNIST MODEL

## Summary

This is a MNIST model that classifies hand-written digits.


## Input

* Input format: array
* Input body:

```
{
    "id" : "0",
    "data": test_img
}
```

## Output
* Output format: json
* Output body (if there is no error)


Writing README.md


Now we download the `.h5` model we have trained previously

In [5]:
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space()
wslib.download_file("mnist_model.h5", "mnist_model.h5")

{'file_name': 'mnist_model.h5', 'summary': ['loaded data', 'saved to file']}

Now we put them in a folder and compress them 

In [6]:
!mkdir mnist
!mv -t mnist kernel.py README.md model.json mnist_model.h5

mkdir: cannot create directory ‘mnist’: File exists


Compress the package into a tar file for uploading

In [7]:
!tar czf mnist.tar mnist

## 2. Deploy the MNIST model

In [8]:
import time, os
from urllib import response
import numpy as np
from ibm_wmla_client import Connection

First get the user access token from the environment varible `USER_ACCESS_TOKEN` for connecting to the WMLA EDI service.

In [9]:
USER_ACCESS_TOKEN = os.getenv('USER_ACCESS_TOKEN')

Next define the EDI service parameters

In [10]:
service_url = "https://wmla-console-wmla.apps.ocp.tanggle.com"
service_instance = 'ibm_wml_elastic_distributed_inference'

Create the connection to EDI using WMLA Python Client and connect to EDI. 

In this CP4D environment, we used the user access token instead of username and password, so we leave the username and password blank, and pass in the user access token. 

In [11]:
edi_connection = Connection(service_url, service_instance, wmla_v1=True, edi=True,
                 apikey=None, username=None, password=None, user_access_token = USER_ACCESS_TOKEN)

In [12]:
edi_connection.connect()

Connecting to EDI
EDI Token created
EDI Service connected


In [13]:
conn = edi_connection.service_edi

Test the connection by listing all the models

In [14]:
print(conn.get_models(verify=False))



{
    "result": [
        {
            "name": "pingpong",
            "uid": "03105212-74f6-48eb-a8bf-114649943645",
            "tag": "test",
            "size": 4266,
            "weight_path": "model",
            "model_path": "/opt/wml-edi/repo/pingpong/pingpong-20220622-131252",
            "create_time": 1655903572,
            "last_updated_time": 1655903572,
            "started_at": 1659419890,
            "creator": "admin",
            "runtime": "condaenvpy39",
            "kernel_path": "kernel.py",
            "service_uri": "https://wmla-inference-wmla.apps.ocp.tanggle.com/dlim/v1/inference/pingpong",
            "attributes": [],
            "mk_environments": [],
            "schema_version": "1",
            "stream_uri": "10.100.6.42:8890"
        },
        {
            "name": "resnet18-jihunkim",
            "uid": "db568a28-cced-4704-ae87-bc385dd2c4cb",
            "tag": "",
            "size": 104067,
            "weight_path": "./",
            "model_pat

## 2. Deploy the MNIST model

We first specify the model name we defined in `model.json`

In [15]:
model_name = 'mnisttest'

To deploy the model, we use `deloy_model` funtion and attach the tar file

In [16]:
file_handle = open("mnist.tar", "rb")
response = conn.deploy_model(body = file_handle)



In [17]:
print(response.result)

{'name': 'mnisttest', 'uid': '1aff5357-4c1c-492b-b019-1028e955ff43', 'tag': 'test', 'size': 1251075, 'weight_path': 'mnist_model.h5', 'model_path': '/opt/wml-edi/repo/mnisttest/mnisttest-20220810-081746', 'create_time': 1660119466, 'last_updated_time': 1660119466, 'started_at': 0, 'creator': 'admin', 'runtime': 'dlipy3', 'kernel_path': 'kernel.py', 'service_uri': '', 'attributes': [], 'mk_environments': [], 'schema_version': '1'}


It can be seen that the model upload is successful

## 3. Start the model for inferencing
In this section, we update the model profiles and start the model.

To update the profile, we need to first check the profile and update the fields. 

In [18]:
response = conn.get_model_profile(model_name)
model_profile = response.result



In [19]:
model_profile

{'schema_version': '1.2',
 'type': 'inference',
 'name': 'mnisttest',
 'create_time': 'Wed Aug 10 08:17:46 2022 GMT',
 'last_update_time': 'Wed Aug 10 08:17:46 2022 GMT',
 'replica': 1,
 'policy': {'name': 'capacity',
  'schedule_interval': 3,
  'kernel_min': 1,
  'kernel_max': 100,
  'kernel_delay_release_time': 60,
  'task_execution_timeout': 60,
  'task_batch_size': 1,
  'task_pipe_size': 1,
  'task_parallel_size': 1,
  'stream_number_per_group': 0,
  'stream_discard_slow_tasks': True},
 'security': {'ssl': {'enable': True,
   'server_crt': '${REDHARE_TOP}/security/tls.crt',
   'server_key': '${REDHARE_TOP}/security/tls.key'}},
 'resource_allocation': {'service': {'type': 'k8s',
   'namespace': '',
   'image_name': '',
   'node_selector': ''},
  'kernel': {'type': 'msd',
   'namespace': '',
   'image_name': '',
   'resource_plan': 'sample-project/inference',
   'resources': 'ncpus=0.5,ncpus_limit=2,mem=1024,mem_limit=4096',
   'accelerator_resources': '',
   'gpu_pack_id': '',
   'n

Next we update the fields

In [20]:
def update_model_profile(model_profile):
    model_profile['kernel']['gpu'] = 'shared'
    model_profile['resource_allocation']['kernel']['resources'] = 'ncpus=0.5,ncpus_limit=4,mem=1024,mem_limit=4096'
    

In [21]:
update_model_profile(model_profile)

In [22]:
model_profile

{'schema_version': '1.2',
 'type': 'inference',
 'name': 'mnisttest',
 'create_time': 'Wed Aug 10 08:17:46 2022 GMT',
 'last_update_time': 'Wed Aug 10 08:17:46 2022 GMT',
 'replica': 1,
 'policy': {'name': 'capacity',
  'schedule_interval': 3,
  'kernel_min': 1,
  'kernel_max': 100,
  'kernel_delay_release_time': 60,
  'task_execution_timeout': 60,
  'task_batch_size': 1,
  'task_pipe_size': 1,
  'task_parallel_size': 1,
  'stream_number_per_group': 0,
  'stream_discard_slow_tasks': True},
 'security': {'ssl': {'enable': True,
   'server_crt': '${REDHARE_TOP}/security/tls.crt',
   'server_key': '${REDHARE_TOP}/security/tls.key'}},
 'resource_allocation': {'service': {'type': 'k8s',
   'namespace': '',
   'image_name': '',
   'node_selector': ''},
  'kernel': {'type': 'msd',
   'namespace': '',
   'image_name': '',
   'resource_plan': 'sample-project/inference',
   'resources': 'ncpus=0.5,ncpus_limit=4,mem=1024,mem_limit=4096',
   'accelerator_resources': '',
   'gpu_pack_id': '',
   'n

Now we need to upload this to WMLA 

In [23]:
response = conn.update_model_profile(model_name, model_profile)



We can check if the model profile has been updated 

In [24]:
response = conn.get_model_profile(model_name)
response.result



{'schema_version': '1.2',
 'type': 'inference',
 'name': 'mnisttest',
 'create_time': 'Wed Aug 10 08:17:46 2022 GMT',
 'last_update_time': 'Wed Aug 10 08:17:48 2022 GMT',
 'replica': 1,
 'policy': {'name': 'capacity',
  'schedule_interval': 3,
  'kernel_min': 1,
  'kernel_max': 100,
  'kernel_delay_release_time': 60,
  'task_execution_timeout': 60,
  'task_batch_size': 1,
  'task_pipe_size': 1,
  'task_parallel_size': 1,
  'stream_number_per_group': 0,
  'stream_discard_slow_tasks': True},
 'security': {'ssl': {'enable': True,
   'server_crt': '${REDHARE_TOP}/security/tls.crt',
   'server_key': '${REDHARE_TOP}/security/tls.key'}},
 'resource_allocation': {'service': {'type': 'k8s',
   'namespace': '',
   'image_name': '',
   'node_selector': ''},
  'kernel': {'type': 'msd',
   'namespace': '',
   'image_name': '',
   'resource_plan': 'sample-project/inference',
   'resources': 'ncpus=0.5,ncpus_limit=4,mem=1024,mem_limit=4096',
   'accelerator_resources': '',
   'gpu_pack_id': '',
   'n

   We can see that the GPU has changed to `shared` and the resources has been updated. 
   
   Now we can start the model.

In [25]:
response = conn.start_model_inference(model_name)
print(response)



{
    "result": {},
    "headers": {
        "_store": {
            "server": [
                "Server",
                "nginx/1.20.2"
            ],
            "date": [
                "Date",
                "Wed, 10 Aug 2022 08:17:50 GMT"
            ],
            "content-type": [
                "Content-Type",
                "text/html; charset=ISO-8859-1"
            ],
            "transfer-encoding": [
                "Transfer-Encoding",
                "chunked"
            ],
            "connection": [
                "Connection",
                "keep-alive"
            ],
            "access-control-allow-methods": [
                "Access-Control-Allow-Methods",
                "GET,PUT,POST,DELETE"
            ],
            "access-control-allow-credentials": [
                "Access-Control-Allow-Credentials",
                "true"
            ],
            "access-control-allow-headers": [
                "Access-Control-Allow-Headers",
                "

We might need to wait for a few seconds for the model to go online, then we can check the model status.

In [26]:
response = conn.get_model_instance(model_name)
print(response.result)



{'name': 'mnisttest', 'state': 'not-available'}


You might see the below output if you run the above block immediately

```
{'name': 'mnisttest', 'state': 'not-available'}
```

This is normal because it takes a few seconds for WMLA to bring the model online.

If the status is `disabled` it means the model deployment is unsuccessful, or the model is stopped.

When we see the model state as `enabled`, it means that the model is successful 

We can use the below script to make sure the model is enabled before we proceed.

In [27]:
import time
response = conn.get_model_instance(model_name)
timeout = 20
while response.result['state'] != 'enabled' or timeout < 0:
    time.sleep(1)
    response = conn.get_model_instance(model_name)
    timeout -= 1
    print('WMLA EDI starting the model, timeout = ', timeout)
    
print(response.result)



WMLA EDI starting the model, timeout =  19




WMLA EDI starting the model, timeout =  18




WMLA EDI starting the model, timeout =  17




WMLA EDI starting the model, timeout =  16




WMLA EDI starting the model, timeout =  15




WMLA EDI starting the model, timeout =  14




WMLA EDI starting the model, timeout =  13
{'instances': [{'isd_uid': 'e77f981b-41b5-43e9-8a40-56bb32739a65', 'pj_jobid': 'edi-mnisttest-7c6f96578-qf7w9', 'gpu_mode': 'shared', 'gpu_packid': 'edi-mnisttest', 'client_number': 0, 'pending_tasks': 0, 'request_per_sec': 0.0, 'data_size_per_sec': 0.0, 'isd_container': []}], 'name': 'mnisttest', 'state': 'enabled'}


## 4. Inference with EDI
In this section we demonstrate how to use the uploaded MNIST model to infer a test image. 

We create a random image for testing.

In [28]:
img_shape = (28, 28, 1)
x_test = np.random.random_sample((1,) + img_shape)
x_test = x_test.tolist()

In the package we uploaded, our model takes the input structure
```
{id: id_num, 'data': image_array}
```

We specify the data in the same format:

In [29]:
data = {'id': 0, 'data': x_test}

In [30]:
response = conn.run_inference(model_name, data)
print(response.result)



{'key': 0, 'data': [[-5.415988922119141, -17.939651489257812, 0.7941587567329407, 5.815486431121826, -24.95937728881836, 9.760404586791992, -0.9046012163162231, -1.0441383123397827, 1.9513847827911377, -3.5598299503326416]]}


In the `data` field, we can see our inference results

## 5. Clean up
This sections demonstrates how to delete a model from WMLA after use.

To delete a model, you'll need to stop the model first. This takes the model offline for inference, but the model stays in the WMLA server. You can still start a model using `start_model_inference()`.

To stop a model:

In [31]:
response = conn.stop_model_inference("mnisttest")



In [32]:
print(response)

{
    "result": {},
    "headers": {
        "_store": {
            "server": [
                "Server",
                "nginx/1.20.2"
            ],
            "date": [
                "Date",
                "Wed, 10 Aug 2022 08:18:38 GMT"
            ],
            "content-type": [
                "Content-Type",
                "text/html; charset=ISO-8859-1"
            ],
            "transfer-encoding": [
                "Transfer-Encoding",
                "chunked"
            ],
            "connection": [
                "Connection",
                "keep-alive"
            ],
            "access-control-allow-methods": [
                "Access-Control-Allow-Methods",
                "GET,PUT,POST,DELETE"
            ],
            "access-control-allow-credentials": [
                "Access-Control-Allow-Credentials",
                "true"
            ],
            "access-control-allow-headers": [
                "Access-Control-Allow-Headers",
                "

You will need to wait for a few seconds for the model to stop.

You can check the model state by `get_model_instance(model_name)` and confirm if the status is disabled:

In [33]:
print(conn.get_model_instance(model_name).result)



{'instances': [{'isd_uid': 'e77f981b-41b5-43e9-8a40-56bb32739a65', 'pj_jobid': 'edi-mnisttest-7c6f96578-qf7w9', 'gpu_mode': 'shared', 'gpu_packid': 'edi-mnisttest', 'client_number': 1, 'pending_tasks': 1, 'request_per_sec': 0.0, 'data_size_per_sec': 0.0, 'isd_container': []}], 'name': 'mnisttest', 'state': 'enabled'}


Similar to starting a model, stopping a model might take a few seconds too.

In [34]:
response = conn.get_model_instance(model_name)

timeout = 20
while response.result['state'] != 'disabled' or timeout < 0:
    time.sleep(1)
    response = conn.get_model_instance(model_name)
    timeout -= 1
    print('WMLA EDI stopping the model, timeout = ', timeout)




WMLA EDI stopping the model, timeout =  19




WMLA EDI stopping the model, timeout =  18




WMLA EDI stopping the model, timeout =  17


Now we see the model has stopped, we can safely delete the model.

In [35]:
response = conn.delete_model("mnisttest")



In [36]:
print(response)

{
    "result": {},
    "headers": {
        "_store": {
            "server": [
                "Server",
                "nginx/1.20.2"
            ],
            "date": [
                "Date",
                "Wed, 10 Aug 2022 08:18:44 GMT"
            ],
            "content-type": [
                "Content-Type",
                "text/html; charset=ISO-8859-1"
            ],
            "transfer-encoding": [
                "Transfer-Encoding",
                "chunked"
            ],
            "connection": [
                "Connection",
                "keep-alive"
            ],
            "access-control-allow-methods": [
                "Access-Control-Allow-Methods",
                "GET,PUT,POST,DELETE"
            ],
            "access-control-allow-credentials": [
                "Access-Control-Allow-Credentials",
                "true"
            ],
            "access-control-allow-headers": [
                "Access-Control-Allow-Headers",
                "