# WML-A Model Deployment via CLI
Offical examples can be found here: https://wmla-console-cpd-wmla.apps.cpd.mskcc.org/ui/#/cliTools

In [1]:
%env DIR=/userfs/deployment-tutorial
%env REST_SERVER=https://wmla-console-cpd-wmla.apps.cpd.mskcc.org/dlim/v1/

%env dlim=../wmla-utils/dlim

env: DIR=/userfs/deployment-tutorial
env: REST_SERVER=https://wmla-console-cpd-wmla.apps.cpd.mskcc.org/dlim/v1/
env: dlim=../wmla-utils/dlim


In [2]:
import os
os.environ['auth'] = f"--rest-server {os.environ['REST_SERVER']} --jwt-token {os.environ['USER_ACCESS_TOKEN']}"

## 1. Create Model Deployment
Model / deployment name is specified in `model.json`. Whitesapce not allowed.

In [3]:
%env DIR_submission=/userfs/deployment-tutorial/deployment_submission
%env file_kernel=kernel.py

!rm -rf $DIR_submission
!mkdir -p $DIR_submission

!cp /userfs/training-tutorial/cifar-visdom/model/model.pt $DIR_submission
!cp $DIR/kernel.py $DIR_submission
!cp $DIR/model.json $DIR_submission
!cp $DIR/README.md $DIR_submission

env: DIR_submission=/userfs/deployment-tutorial/deployment_submission
env: file_kernel=kernel.py


In [4]:
!$dlim model deploy -p $DIR_submission $auth -f

Uploading...
</userfs/deployment-tutorial/deployment_submission/README.md> uploaded to server.
</userfs/deployment-tutorial/deployment_submission/kernel.py> uploaded to server.
</userfs/deployment-tutorial/deployment_submission/model.json> uploaded to server.
</userfs/deployment-tutorial/deployment_submission/model.pt> uploaded to server.
Registering...
Model <cifar-model-wendy> is deployed successfully


A newly created deployment is not in "active" status.

In [5]:
!$dlim model list $auth

NAME                 REST URI
cifar-model-wendy    -
deepliif-wendy-test  https://wmla-inference-cpd-wmla.apps.cpd.mskcc.org/dlim/v1/inference/deepliif-wendy-test


In [6]:
%env model_name=cifar-model-wendy

env: model_name=cifar-model-wendy


## 2. Modify Configurations

Some configurations can **only** be specified or modified after the deployment gets created. 

These configurations are flexible and changeable to an existing deployment, meaning that you can stop a deployment, change such config, activate the deployment again, and this new setting will be effective immediately, without the need to do a re-deployment all over again.

One example is resource usage.

A full list of configurable parameters in this category can be found in this doc page: https://www.ibm.com/docs/en/wmla/2.3?topic=inference-edit-service

In [7]:
!$dlim model viewprofile $model_name -j $auth > model_profile.json

In [8]:
import json

profile = json.load(open('model_profile.json'))
profile

{'schema_version': '1.2',
 'type': 'inference',
 'name': 'cifar-model-wendy',
 'create_time': 'Wed May 11 00:42:12 2022 GMT',
 'last_update_time': 'Wed May 11 00:42:12 2022 GMT',
 'replica': 1,
 'policy': {'name': 'capacity',
  'schedule_interval': 3,
  'kernel_min': 1,
  'kernel_max': 100,
  'kernel_delay_release_time': 60,
  'task_execution_timeout': 60,
  'task_batch_size': 1,
  'task_pipe_size': 1,
  'task_parallel_size': 1,
  'stream_number_per_group': 0,
  'stream_discard_slow_tasks': True},
 'security': {'ssl': {'enable': True,
   'server_crt': '${REDHARE_TOP}/security/tls.crt',
   'server_key': '${REDHARE_TOP}/security/tls.key'}},
 'resource_allocation': {'service': {'type': 'k8s',
   'namespace': '',
   'image_name': '',
   'node_selector': ''},
  'kernel': {'type': 'msd',
   'namespace': '',
   'image_name': '',
   'resource_plan': 'sample-project/inference',
   'resources': 'ncpus=0.5,ncpus_limit=2,mem=1024,mem_limit=4096',
   'accelerator_resources': '',
   'gpu_pack_id': '

In [9]:
profile['kernel']['gpu'] = 'exclusive'
profile['policy']['kernel_max'] = 3

In [10]:
with open('model_profile.json','w') as f:
    json.dump(profile, f)

In [11]:
!$dlim model updateprofile $model_name -f model_profile.json $auth

Model is updated successfully


## 3. Start Deployment

In [12]:
!$dlim model start $model_name $auth

Starting model "cifar-model-wendy", run "dlim model view cifar-model-wendy -s" to ensure startup.


In [17]:
!$dlim model view $model_name $auth

Name:		cifar-model-wendy
Tag:		-
Model path:	/opt/wml-edi/repo/cifar-model-wendy/cifar-model-wendy-20220511-004212
Size:		248.07KB
Weight path:	./
Runtime:	dlipy3
Kernel path:	kernel.py
Creator:	wangw6
Create time:	Wed May 11 00:42:12 UTC 2022
Update time:	Wed May 11 00:42:12 UTC 2022
REST URI:	https://wmla-inference-cpd-wmla.apps.cpd.mskcc.org/dlim/v1/inference/cifar-model-wendy
Attributes:	No attribute defined
Environments:	No environment variable defined
Schema version:	1


In [19]:
!$dlim model view $model_name -s $auth

Name:   cifar-model-wendy
State:  Not-available


### Test Deployment

In [20]:
from PIL import Image
from io import BytesIO
import base64

img = Image.open('camion_s_000148.png')

buffer = BytesIO()
img.save(buffer, 'PNG')

img_bytes = base64.b64encode(buffer.getvalue()).decode('utf-8')

In [21]:
img_bytes

'iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAIAAAD8GO2jAAAKGklEQVR4nAXBWXNcZ0IA0G+7e/e9va9qqSVZUmTLWxYTJyGhamDMLNRADVDwwh+gigd+Do8UD/BA1cxUUinGkCGTwfG44nG8yIusfe1Wb7fvfr+Vc+C//PL+6evHo4NXQpDm4juLq5vl1qJpkZ3tB0e7z1gYYUHcskdM+87Hn15ZfyebT7dfPJGSUpa93H4e+OOc5ozi6SSJkoyLvF6vlCsFoULOQJYqEsym1VJF1ZuKuO3FFSEZkolMeDabqDTr1hqLvSu9K0ud7kKj0dQ0g5fs3kKLc5plqT+LxuMp0U0AcblqmE46D2aGSaTiGjGCuU9zRQBjNGdJQvvr3SiOKcsqNY9oaG1t/aMP3+82FzyvzoiwTYMoADlP4yhnzLbscqmxunL11as3ALI8Tzy3rOlgHgwVoFKq2SxOk1wpQHiWQi4M3ZqPx9XWwuK1K41eR9N0wBnj2euLSbI/Yoi+ef70g82rn975QCkVBPPjo3NdM3XdrdW7xydvddOO0jgIxkSDrmunaSI44Fwahk7yJC5Yplupv3vzVm9lLeT8zf5JkCSR70/8ycVg5np1gPLP/+M/tb9Fn939RNNYq9UBauzPwj88eUY0wym6XCga+RiBer0iBJ1MxwjYhJBSySOGoTFcTK3CQZB+/7tH00l0dj7UMNSQzDnNMtquk8vBkWvooR/sHBy02zVNI+1eq9NrHQ9O3jw/abTrh8djwKSkUhBh6oZBtDQTrusSYhDbbl76fPfk5OX2C6QRkbM0jDGSaR74YRDG0eHpK8cqbqxuAE7/75v/XVpeXt9Yr1Y9wySeayA+j3OUJnnqh0JkpqVFQegWXcPElLIkSUipUts92bk4PLC1fB7PouASSumHkZ9mxNBqzYZV9Lr9mz0THzz9FkPKhBiNJ9evb15ZW+m164UPbz97fZxnZq5JCVyp+GBwrhuGV24AEKd

In [22]:
import urllib3
urllib3.disable_warnings()

#### Test: A Single Request

In [23]:
%%time
import requests

url = 'https://wmla-inference-cpd-wmla.apps.cpd.mskcc.org/dlim/v1/inference/cifar-model-wendy'
headers = {'Authorization': 'Bearer ' + os.environ['USER_ACCESS_TOKEN']}

res = requests.post(url=url, headers=headers, json={'img':img_bytes}, verify=False)
res

CPU times: user 46.2 ms, sys: 5.53 ms, total: 51.7 ms
Wall time: 21.6 s


<Response [200]>

In [24]:
res.text

'{"pred_class": 0}'

In [27]:
res = requests.post(url=url, headers=headers, json={'img':''}, verify=False)
res

<Response [200]>

In [28]:
res.text

'{"msg": "cannot identify image file <_io.BytesIO object at 0x7fd0f8ebf3b0>"}'

#### Test: Multiple Concurrent Requests
You can use `requests-futures` to initiate async calls to mimic concurrent requests. At the backend it uses multi-threading. More details can be found in the git repository: https://github.com/ross/requests-futures

In this way, you will be able to trigger WMLA EDI deployment to generate new kernel pods in order to handle these requests and measure/evaluate the performance.

*If your inference request takes very short time, say 20ms, then the timing estimated by this method could be way off due to additional costs such as opening threads.*

*This is more for the mlops team to do. A researcher doesn't have to worry about concurrent requests.*

In [29]:
!pip install requests-futures



In [30]:
import datetime

In [31]:
%%time
from requests_futures.sessions import FuturesSession

session = FuturesSession()
params = {'url':'https://wmla-inference-cpd-wmla.apps.cpd.mskcc.org/dlim/v1/inference/cifar-model-wendy',
          'headers':headers, 
          'json':{'img':img_bytes},
          'verify':False}

l_sessions = []
for i in range(100):
    future_session = session.post(**params)
    l_sessions.append(future_session)
    print(datetime.datetime.now(),i)

2022-05-11 00:45:57.479200 0
2022-05-11 00:45:57.481471 1
2022-05-11 00:45:57.483664 2
2022-05-11 00:45:57.484950 3
2022-05-11 00:45:57.488657 4
2022-05-11 00:45:57.489994 5
2022-05-11 00:45:57.492143 6
2022-05-11 00:45:57.493449 7
2022-05-11 00:45:57.493537 8
2022-05-11 00:45:57.493568 9
2022-05-11 00:45:57.493595 10
2022-05-11 00:45:57.493990 11
2022-05-11 00:45:57.494143 12
2022-05-11 00:45:57.494287 13
2022-05-11 00:45:57.494393 14
2022-05-11 00:45:57.494424 15
2022-05-11 00:45:57.494452 16
2022-05-11 00:45:57.494482 17
2022-05-11 00:45:57.494509 18
2022-05-11 00:45:57.494535 19
2022-05-11 00:45:57.494566 20
2022-05-11 00:45:57.494601 21
2022-05-11 00:45:57.495500 22
2022-05-11 00:45:57.495693 23
2022-05-11 00:45:57.495733 24
2022-05-11 00:45:57.508238 25
2022-05-11 00:45:57.508837 26
2022-05-11 00:45:57.508875 27
2022-05-11 00:45:57.537591 28
2022-05-11 00:45:57.537640 29
2022-05-11 00:45:57.537999 30
2022-05-11 00:45:57.538054 31
2022-05-11 00:45:57.551791 32
2022-05-11 00:45:57.

In [32]:
%%time
l_res = []
for i in range(100):
    l_res.append(l_sessions[i].result())
    print(datetime.datetime.now(),i)

2022-05-11 00:45:58.832526 0
2022-05-11 00:46:00.531355 1
2022-05-11 00:46:00.531946 2
2022-05-11 00:46:00.531979 3
2022-05-11 00:46:00.531995 4
2022-05-11 00:46:00.532011 5
2022-05-11 00:46:00.532026 6
2022-05-11 00:46:00.534452 7
2022-05-11 00:46:00.534475 8
2022-05-11 00:46:01.431330 9
2022-05-11 00:46:01.431451 10
2022-05-11 00:46:01.431470 11
2022-05-11 00:46:01.431486 12
2022-05-11 00:46:01.631854 13
2022-05-11 00:46:01.631960 14
2022-05-11 00:46:01.632359 15
2022-05-11 00:46:01.632395 16
2022-05-11 00:46:03.531899 17
2022-05-11 00:46:03.532004 18
2022-05-11 00:46:03.532027 19
2022-05-11 00:46:03.532084 20
2022-05-11 00:46:03.830730 21
2022-05-11 00:46:03.830818 22
2022-05-11 00:46:03.830835 23
2022-05-11 00:46:03.830857 24
2022-05-11 00:46:05.430302 25
2022-05-11 00:46:05.430426 26
2022-05-11 00:46:05.430448 27
2022-05-11 00:46:05.430465 28
2022-05-11 00:46:05.730363 29
2022-05-11 00:46:05.730503 30
2022-05-11 00:46:05.730522 31
2022-05-11 00:46:05.730539 32
2022-05-11 00:46:07.

In [34]:
for res in l_res:
    print(res.text)

{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_class": 0}
{"pred_cla

## 4. Stop Deployment
You need to stop a deployment when
- you want to change configurable parameters for the existing deployment, or
- you want to delete this deployment

In [35]:
!$dlim model stop $model_name $auth -f

Stopping model "cifar-model-wendy", run "dlim model view cifar-model-wendy -s" to ensure stop.


## 5. Remove Deployment

In [39]:
!$dlim model undeploy $model_name $auth -f

Undeployed model "cifar-model-wendy", run "dlim model list" to ensure deletion.


In [40]:
!$dlim model list $auth

NAME                 REST URI
deepliif-wendy-test  https://wmla-inference-cpd-wmla.apps.cpd.mskcc.org/dlim/v1/inference/deepliif-wendy-test
