# MNIST Model EDI Deployment Tutorial
This tutorial demonstrates:
1. Convert a MNIST model inference script to a EDI package for deployment
2. Deploy the MNIST model
3. Start the model
4. Inference with EDI 

Preparation: Installing the WMLA Python Client

In [1]:
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space()

In [2]:
wslib.download_file("wmla-python-sdk.zip", "wmla-python-sdk.zip")
!ls && unzip wmla-python-sdk.zip | tail -n 1
!(cd wmla-python-sdk && pip install .)

wmla-python-sdk.zip
  inflating: wmla-python-sdk/.git/logs/refs/remotes/origin/sherry  
Processing /home/wsuser/work/wmla-python-sdk
[33m  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.[0m
Building wheels for collected packages: ibm-wmla
  Building wheel for ibm-wmla (setup.py) ... [?25ldone
[?25h  Created wheel for ibm-wmla: filename=ibm_wmla-0.0.2-py3-none-any.whl size=74703 sha256=ed4d0187c4b5e187bb7efd7af98c456eeabbc60b0dbee35f45fef89f293f511a
  Stored in directory: /tmp/1000700000/.cache/pip/wheels/a3/d0/18/a602910843dd8e9ec33889e0737d63820e2c8116abe3991ab2
Successfully built ibm-wmla
Installing collected packages: ibm-wmla
Successfully

In [2]:
wslib.download_file("wmla-python-client.zip", "wmla-python-client.zip")
!ls && unzip wmla-python-client.zip | tail -n 1
!(cd wmla-python-client && pip install .)

ibm-cloud-sdk-core	   ibm-cloud-sdk-core.zip  wmla-python-sdk
ibm-cloud-sdk-core-3.15.3  wmla-python-client.zip  wmla-python-sdk.zip
  inflating: wmla-python-client/.git/logs/refs/remotes/origin/sherry_branch  
Processing /home/wsuser/work/wmla-python-client
[33m  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.[0m
Building wheels for collected packages: ibm-wmla-client
  Building wheel for ibm-wmla-client (setup.py) ... [?25ldone
[?25h  Created wheel for ibm-wmla-client: filename=ibm_wmla_client-0.0.1-py3-none-any.whl size=15088 sha256=4b26670c2774912bcd7cf3f282e6af9effe9ac973a97b9acfe0a337dfd1e8596
  Stored in directory: /tmp/1000700000/.cac

In [4]:
wslib.download_file("ibm-cloud-sdk-core.zip", "ibm-cloud-sdk-core.zip")
!ls && unzip ibm-cloud-sdk-core.zip | tail -n 1
!tar -zxvf /home/wsuser/work/ibm-cloud-sdk-core/ibm-cloud-sdk-core-3.15.3.tar.gz 
!(cd ibm-cloud-sdk-core && pip install ./PyJWT-2.4.0-py3-none-any.whl)
!(cd ibm-cloud-sdk-core-3.15.3 && pip install .)

ibm-cloud-sdk-core.zip	wmla-python-client	wmla-python-sdk
__MACOSX		wmla-python-client.zip	wmla-python-sdk.zip
  inflating: ibm-cloud-sdk-core/idna-3.3-py3-none-any.whl  
ibm-cloud-sdk-core-3.15.3/
ibm-cloud-sdk-core-3.15.3/LICENSE
ibm-cloud-sdk-core-3.15.3/MANIFEST.in
ibm-cloud-sdk-core-3.15.3/PKG-INFO
ibm-cloud-sdk-core-3.15.3/README.md
ibm-cloud-sdk-core-3.15.3/ibm_cloud_sdk_core/
ibm-cloud-sdk-core-3.15.3/ibm_cloud_sdk_core/__init__.py
ibm-cloud-sdk-core-3.15.3/ibm_cloud_sdk_core/api_exception.py
ibm-cloud-sdk-core-3.15.3/ibm_cloud_sdk_core/authenticators/
ibm-cloud-sdk-core-3.15.3/ibm_cloud_sdk_core/authenticators/__init__.py
ibm-cloud-sdk-core-3.15.3/ibm_cloud_sdk_core/authenticators/authenticator.py
ibm-cloud-sdk-core-3.15.3/ibm_cloud_sdk_core/authenticators/basic_authenticator.py
ibm-cloud-sdk-core-3.15.3/ibm_cloud_sdk_core/authenticators/bearer_token_authenticator.py
ibm-cloud-sdk-core-3.15.3/ibm_cloud_sdk_core/authenticators/container_authenticator.py
ibm-cloud-sdk-core-3.15.

Import the WMLA packages

In [3]:
import ibm_wmla,ibm_wmla_client

## 1. Convert a MNIST script to EDI package 
This section introduces how to convert a MNIST python script to a package that is uploadable to EDI. 

The package:
```
kernel.py
model.json
readme.md
model.h5
```

An example package can be found in `wmla-python-client/examples/mnist_example`

We trained a MNIST model using Keras, and saved the model as `h5`
For a non-EDI task, the inference code would be:

In [None]:
mnist_model = tf.keras.models.load_model('mnist_model.h5')

img_shape = (28, 28, 1)
x_test = np.random.random_sample((1,) + img_shape)
results = mnist_model.predict(x_test)
print(results)

To use it in `kernel.py`, we simply separate them into `on_kernel_start` and `on_task_invoke`

In [None]:
def on_kernel_start(self, kernel_context):
    try:
        Kernel.log_info("kernel input: " + kernel_context.get_model_description())



        model_desc = json.loads(kernel_context.get_model_description())
        model_path = model_desc['model_path']
        if model_path == '':
            model_path = os.getcwd()
        # os.chdir(model_path)
        Kernel.log_info("currect dir" + os.getcwd())

        model_path = model_path + '/' + model_desc['weight_path']

        # Create Keras ResNet
        self.model = tf.keras.models.load_model(model_path)

        # Generate test samples
        self.img_shape = (28, 28, 1)
        x_test = np.random.random_sample((1,) + self.img_shape)

        # Warm up
        y_keras = self.model.predict(x_test) # initialize the model first, don't take first predict into account
        start = time.time()
        y_keras = self.model.predict(x_test)
        end = time.time()
        Keras_time = end - start
        Kernel.log_info('Keras time : {0} s'.format(Keras_time))

    except Exception as e:
        Kernel.log_error(str(e))

In [None]:
def on_task_invoke(self, task_context):
    try:
        start = time.time()
        Kernel.log_info(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
        Kernel.log_info('on_task_invoke')
        while task_context != None:
            input_data = json.loads(task_context.get_input_data())
            img_id = input_data['id']
            img_data = input_data['data']

            img_data = np.asarray(img_data).astype('float32')
            y_keras = self.model.predict(img_data)

            output_data = {}
            output_data['key'] = img_id
            output_data['data'] = y_keras.tolist()


            task_context.set_output_data(json.dumps(output_data))
            task_context = task_context.next()
        end = time.time()
        Kernel.log_info("exit on_task_invoke, using time %.2f" % (end-start))
        Kernel.log_info("<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<")
    except Exception as e:
        task_context.set_output_data(str(e))
        Kernel.log_error(str(e))


Then we specify the model parameters with `model.json`

In [None]:
{
    "name" : "mnisttest",
    "tag" : "test",
    "weight_path" : "mnist_model.h5",
    "runtime" : "dlipy3",
    "kernel_path" : "kernel.py",
    "schema_version" : "1"
}

We will also need a `readme.md` to tell users what the model does and what's the test data format:

```
# README of MNIST MODEL

## Summary

This is a MNIST model that classifies hand-written digits.

## Input

* Input format: json
* Input body:


{
    "id": img_id
    "data" : img_array
}


```

Finally we compress the packages into a tar file for uploading

`!tar czf mnist.tar mnist`

## 2. Deploy the MNIST model

In [4]:
import time, os
from urllib import response
import numpy as np
from ibm_wmla_client import Connection

First get the user access token from the environment varible `USER_ACCESS_TOKEN` for connecting to the WMLA EDI service.

In [5]:
USER_ACCESS_TOKEN = os.getenv('USER_ACCESS_TOKEN')

Next define the EDI service parameters

In [6]:
service_url = "https://wmla-console-wmla.apps.ocp.tanggle.com"
service_instance = 'ibm_wml_elastic_distributed_inference'

Create the connection to EDI using WMLA Python Client and connect to EDI. 

In this CP4D environment, we used the user access token instead of username and password, so we leave the username and password blank, and pass in the user access token. 

In [7]:
edi_connection = Connection(service_url, service_instance, wmla_v1=True, edi=True,
                 apikey=None, username=None, password=None, user_access_token = USER_ACCESS_TOKEN)

In [8]:
edi_connection.connect()

Connecting to EDI
EDI Token created
EDI Service connected


In [9]:
conn = edi_connection.service_edi

Test the connection by listing all the models

In [10]:
print(conn.get_models(verify=False))



{
    "result": [
        {
            "name": "mnisttest",
            "uid": "74db835c-b138-4113-82bd-118b30721255",
            "tag": "test",
            "size": 1252846,
            "weight_path": "mnist_model.h5",
            "model_path": "/opt/wml-edi/repo/mnisttest/mnisttest-20220804-000210",
            "create_time": 1659571330,
            "last_updated_time": 1659571330,
            "started_at": 1659571970,
            "creator": "admin",
            "runtime": "dlipy3",
            "kernel_path": "kernel.py",
            "service_uri": "https://wmla-inference-wmla.apps.ocp.tanggle.com/dlim/v1/inference/mnisttest",
            "attributes": [],
            "mk_environments": [],
            "schema_version": "1",
            "stream_uri": "10.100.12.11:8890"
        },
        {
            "name": "pingpong",
            "uid": "03105212-74f6-48eb-a8bf-114649943645",
            "tag": "test",
            "size": 4266,
            "weight_path": "model",
            "mo

## 2. Deploy the MNIST model

We first specify the model name we defined in `model.json`

In [16]:
model_name = 'mnisttest'

To deploy the model, we use `deloy_model` funtion and attach the tar file

In [17]:
file_handle = open("wmla-python-client/examples/mnist_example/mnist.tar", "rb")
response = conn.deploy_model(body = file_handle)



In [18]:
print(response.result)

{'name': 'mnisttest', 'uid': '1c667cc1-3f07-453c-9320-e4b29b9b2f30', 'tag': 'test', 'size': 1252846, 'weight_path': 'mnist_model.h5', 'model_path': '/opt/wml-edi/repo/mnisttest/mnisttest-20220808-163848', 'create_time': 1659976728, 'last_updated_time': 1659976728, 'started_at': 0, 'creator': 'admin', 'runtime': 'dlipy3', 'kernel_path': 'kernel.py', 'service_uri': '', 'attributes': [], 'mk_environments': [], 'schema_version': '1'}


It can be seen that the model upload is successful

## 3. Start the model for inferencing
In this section, we update the model profiles and start the model.

To update the profile, we need to first check the profile and update the fields. 

In [28]:
response = conn.get_model_profile(model_name)
model_profile = response.result



In [29]:
model_profile

{'schema_version': '1.2',
 'type': 'inference',
 'name': 'mnisttest',
 'create_time': 'Mon Aug  8 16:38:48 2022 GMT',
 'last_update_time': 'Mon Aug  8 16:38:48 2022 GMT',
 'replica': 1,
 'policy': {'name': 'capacity',
  'schedule_interval': 3,
  'kernel_min': 1,
  'kernel_max': 100,
  'kernel_delay_release_time': 60,
  'task_execution_timeout': 60,
  'task_batch_size': 1,
  'task_pipe_size': 1,
  'task_parallel_size': 1,
  'stream_number_per_group': 0,
  'stream_discard_slow_tasks': True},
 'security': {'ssl': {'enable': True,
   'server_crt': '${REDHARE_TOP}/security/tls.crt',
   'server_key': '${REDHARE_TOP}/security/tls.key'}},
 'resource_allocation': {'service': {'type': 'k8s',
   'namespace': '',
   'image_name': '',
   'node_selector': ''},
  'kernel': {'type': 'msd',
   'namespace': '',
   'image_name': '',
   'resource_plan': 'sample-project/inference',
   'resources': 'ncpus=0.5,ncpus_limit=2,mem=1024,mem_limit=4096',
   'accelerator_resources': '',
   'gpu_pack_id': '',
   'n

Next we update the fields

In [30]:
def update_model_profile(model_profile):
    model_profile['kernel']['gpu'] = 'shared'
    model_profile['resource_allocation']['kernel']['resources'] = 'ncpus=0.5,ncpus_limit=4,mem=1024,mem_limit=4096'
    

In [31]:
update_model_profile(model_profile)

In [32]:
model_profile

{'schema_version': '1.2',
 'type': 'inference',
 'name': 'mnisttest',
 'create_time': 'Mon Aug  8 16:38:48 2022 GMT',
 'last_update_time': 'Mon Aug  8 16:38:48 2022 GMT',
 'replica': 1,
 'policy': {'name': 'capacity',
  'schedule_interval': 3,
  'kernel_min': 1,
  'kernel_max': 100,
  'kernel_delay_release_time': 60,
  'task_execution_timeout': 60,
  'task_batch_size': 1,
  'task_pipe_size': 1,
  'task_parallel_size': 1,
  'stream_number_per_group': 0,
  'stream_discard_slow_tasks': True},
 'security': {'ssl': {'enable': True,
   'server_crt': '${REDHARE_TOP}/security/tls.crt',
   'server_key': '${REDHARE_TOP}/security/tls.key'}},
 'resource_allocation': {'service': {'type': 'k8s',
   'namespace': '',
   'image_name': '',
   'node_selector': ''},
  'kernel': {'type': 'msd',
   'namespace': '',
   'image_name': '',
   'resource_plan': 'sample-project/inference',
   'resources': 'ncpus=0.5,ncpus_limit=4,mem=1024,mem_limit=4096',
   'accelerator_resources': '',
   'gpu_pack_id': '',
   'n

Now we need to upload this to WMLA 

In [33]:
response = conn.update_model_profile(model_name, model_profile)



We can check if the model profile has been updated 

In [34]:
response = conn.get_model_profile(model_name)
response.result



{'schema_version': '1.2',
 'type': 'inference',
 'name': 'mnisttest',
 'create_time': 'Mon Aug  8 16:38:48 2022 GMT',
 'last_update_time': 'Mon Aug  8 16:53:38 2022 GMT',
 'replica': 1,
 'policy': {'name': 'capacity',
  'schedule_interval': 3,
  'kernel_min': 1,
  'kernel_max': 100,
  'kernel_delay_release_time': 60,
  'task_execution_timeout': 60,
  'task_batch_size': 1,
  'task_pipe_size': 1,
  'task_parallel_size': 1,
  'stream_number_per_group': 0,
  'stream_discard_slow_tasks': True},
 'security': {'ssl': {'enable': True,
   'server_crt': '${REDHARE_TOP}/security/tls.crt',
   'server_key': '${REDHARE_TOP}/security/tls.key'}},
 'resource_allocation': {'service': {'type': 'k8s',
   'namespace': '',
   'image_name': '',
   'node_selector': ''},
  'kernel': {'type': 'msd',
   'namespace': '',
   'image_name': '',
   'resource_plan': 'sample-project/inference',
   'resources': 'ncpus=0.5,ncpus_limit=4,mem=1024,mem_limit=4096',
   'accelerator_resources': '',
   'gpu_pack_id': '',
   'n

   We can see that the GPU has changed to `shared` and the resources has been updated. 
   
   Now we can start the model.

In [35]:
response = conn.start_model_inference(model_name)
print(response)



{
    "result": {},
    "headers": {
        "_store": {
            "server": [
                "Server",
                "nginx/1.20.2"
            ],
            "date": [
                "Date",
                "Mon, 08 Aug 2022 16:56:00 GMT"
            ],
            "content-type": [
                "Content-Type",
                "text/html; charset=ISO-8859-1"
            ],
            "transfer-encoding": [
                "Transfer-Encoding",
                "chunked"
            ],
            "connection": [
                "Connection",
                "keep-alive"
            ],
            "access-control-allow-methods": [
                "Access-Control-Allow-Methods",
                "GET,PUT,POST,DELETE"
            ],
            "access-control-allow-credentials": [
                "Access-Control-Allow-Credentials",
                "true"
            ],
            "access-control-allow-headers": [
                "Access-Control-Allow-Headers",
                "

We might need to wait for a few seconds for the model to go online, then we can check the model status.

In [37]:
response = conn.get_model_instance(model_name)
print(response.result)



{'instances': [{'isd_uid': '1efa5742-d7e8-46f8-be7d-52fe33e891c4', 'pj_jobid': 'edi-mnisttest-7c6f96578-7rrrz', 'gpu_mode': 'shared', 'gpu_packid': 'edi-mnisttest', 'client_number': 0, 'pending_tasks': 0, 'request_per_sec': 0.0, 'data_size_per_sec': 0.0, 'isd_container': []}], 'name': 'mnisttest', 'state': 'enabled'}


When we see the model state as `enabled`, it means that the model is successful 

If the status is `not available` it means that WMLA is bringing the model online, and if the status is `disabled` that means the model is not stopped. 

## 4. Inference with EDI
In this section we demonstrate how to use the uploaded MNIST model to infer a test image. 

We create a random image for testing.

In [38]:
img_shape = (28, 28, 1)
x_test = np.random.random_sample((1,) + img_shape)
x_test = x_test.tolist()

In the package we uploaded, our model takes the input structure
```
{id: id_num, 'data': image_array}
```

We specify the data in the same format:

In [39]:
data = {'id': 0, 'data': x_test}

In [40]:
response = conn.run_inference(model_name, data)
print(response.result)



{'key': 0, 'data': [[-12.28003215789795, -18.505023956298828, 1.458389163017273, 8.246170997619629, -24.748476028442383, 13.82961368560791, -1.8304352760314941, -6.018824100494385, 3.46991229057312, -2.4551374912261963]]}


In the `data` field, we can see our inference results

## 5. Clean up
This sections demonstrates how to delete a model from WMLA after use.

To delete a model, you'll need to stop the model first. This takes the model offline for inference, but the model stays in the WMLA server. You can still start a model using `start_model_inference()`.

To stop a model:

In [41]:
response = conn.stop_model_inference("mnisttest")



In [42]:
print(response)

{
    "result": {},
    "headers": {
        "_store": {
            "server": [
                "Server",
                "nginx/1.20.2"
            ],
            "date": [
                "Date",
                "Mon, 08 Aug 2022 16:58:34 GMT"
            ],
            "content-type": [
                "Content-Type",
                "text/html; charset=ISO-8859-1"
            ],
            "transfer-encoding": [
                "Transfer-Encoding",
                "chunked"
            ],
            "connection": [
                "Connection",
                "keep-alive"
            ],
            "access-control-allow-methods": [
                "Access-Control-Allow-Methods",
                "GET,PUT,POST,DELETE"
            ],
            "access-control-allow-credentials": [
                "Access-Control-Allow-Credentials",
                "true"
            ],
            "access-control-allow-headers": [
                "Access-Control-Allow-Headers",
                "

You will need to wait for a few seconds for the model to stop.

You can check the model state by `get_model_instance(model_name)` and confirm if the status is disabled:

In [47]:
print(conn.get_model_instance(model_name).result)



{'name': 'mnisttest', 'state': 'disabled'}


Now we see the model has stopped, we can safely delete the model.

In [48]:
response = conn.delete_model("mnisttest")



In [49]:
print(response)

{
    "result": {},
    "headers": {
        "_store": {
            "server": [
                "Server",
                "nginx/1.20.2"
            ],
            "date": [
                "Date",
                "Mon, 08 Aug 2022 17:01:29 GMT"
            ],
            "content-type": [
                "Content-Type",
                "text/html; charset=ISO-8859-1"
            ],
            "transfer-encoding": [
                "Transfer-Encoding",
                "chunked"
            ],
            "connection": [
                "Connection",
                "keep-alive"
            ],
            "access-control-allow-methods": [
                "Access-Control-Allow-Methods",
                "GET,PUT,POST,DELETE"
            ],
            "access-control-allow-credentials": [
                "Access-Control-Allow-Credentials",
                "true"
            ],
            "access-control-allow-headers": [
                "Access-Control-Allow-Headers",
                "