# WML-A Model Deployment via CLI
Offical examples can be found here: https://wmla-console-cpd-wmla.apps.cpd.mskcc.org/ui/#/cliTools

In [1]:
%env DIR=/userfs/deployment-tutorial
%env REST_SERVER=https://wmla-console-cpd-wmla.apps.cpd.mskcc.org/dlim/v1/

%env dlim=../wmla-utils/dlim

env: DIR=/userfs/deployment-tutorial
env: REST_SERVER=https://wmla-console-cpd-wmla.apps.cpd.mskcc.org/dlim/v1/
env: dlim=../wmla-utils/dlim


In [2]:
import os
os.environ['auth'] = f"--rest-server {os.environ['REST_SERVER']} --jwt-token {os.environ['USER_ACCESS_TOKEN']}"

## 1. Create Model Deployment
Model / deployment name is specified in `model.json`. Whitesapce not allowed.

In [3]:
%env DIR_submission=/userfs/deployment-tutorial/deployment_submission
%env file_kernel=kernel.py

!rm -rf $DIR_submission
!mkdir -p $DIR_submission

!cp /userfs/training-tutorial/cifar-visdom/model/model.pt $DIR_submission
!cp $DIR/kernel.py $DIR_submission
!cp $DIR/model.json $DIR_submission
!cp $DIR/README.md $DIR_submission

env: DIR_submission=/userfs/deployment-tutorial/deployment_submission
env: file_kernel=kernel.py


In [4]:
!$dlim model deploy -p $DIR_submission $auth -f

Uploading...
</userfs/deployment-tutorial/deployment_submission/README.md> uploaded to server.
</userfs/deployment-tutorial/deployment_submission/kernel.py> uploaded to server.
</userfs/deployment-tutorial/deployment_submission/model.json> uploaded to server.
</userfs/deployment-tutorial/deployment_submission/model.pt> uploaded to server.
Registering...
Model <cifar-model-wendy> is deployed successfully


A newly created deployment is not in "active" status.

In [5]:
!$dlim model list $auth

NAME                 REST URI
cifar-model-wendy    -
deepliif-wendy-test  -


In [6]:
%env model_name=cifar-model-wendy

env: model_name=cifar-model-wendy


## 2. Modify Configurations

Some configurations can **only** be specified or modified after the deployment gets created. 

These configurations are flexible and changeable to an existing deployment, meaning that you can stop a deployment, change such config, activate the deployment again, and this new setting will be effective immediately, without the need to do a re-deployment all over again.

One example is resource usage.

A full list of configurable parameters in this category can be found in this doc page: https://www.ibm.com/docs/en/wmla/2.3?topic=inference-edit-service

In [7]:
!$dlim model viewprofile $model_name -j $auth > model_profile.json

In [8]:
import json

profile = json.load(open('model_profile.json'))
profile

{'schema_version': '1.2',
 'type': 'inference',
 'name': 'cifar-model-wendy',
 'create_time': 'Wed May  4 17:16:08 2022 GMT',
 'last_update_time': 'Wed May  4 17:16:08 2022 GMT',
 'replica': 1,
 'policy': {'name': 'capacity',
  'schedule_interval': 3,
  'kernel_min': 1,
  'kernel_max': 100,
  'kernel_delay_release_time': 60,
  'task_execution_timeout': 60,
  'task_batch_size': 1,
  'task_pipe_size': 1,
  'task_parallel_size': 1,
  'stream_number_per_group': 0,
  'stream_discard_slow_tasks': True},
 'security': {'ssl': {'enable': True,
   'server_crt': '${REDHARE_TOP}/security/tls.crt',
   'server_key': '${REDHARE_TOP}/security/tls.key'}},
 'resource_allocation': {'service': {'type': 'k8s',
   'namespace': '',
   'image_name': '',
   'node_selector': ''},
  'kernel': {'type': 'msd',
   'namespace': '',
   'image_name': '',
   'resource_plan': 'sample-project/inference',
   'resources': 'ncpus=0.5,ncpus_limit=2,mem=1024,mem_limit=4096',
   'accelerator_resources': '',
   'gpu_pack_id': '

In [9]:
profile['kernel']['gpu'] = 'exclusive'

In [10]:
with open('model_profile.json','w') as f:
    json.dump(profile, f)

In [11]:
!$dlim model updateprofile $model_name -f model_profile.json $auth

Model is updated successfully


## 3. Start Deployment

In [12]:
!$dlim model start $model_name $auth

Starting model "cifar-model-wendy", run "dlim model view cifar-model-wendy -s" to ensure startup.


In [15]:
!$dlim model view $model_name $auth

Name:		cifar-model-wendy
Tag:		-
Model path:	/opt/wml-edi/repo/cifar-model-wendy/cifar-model-wendy-20220504-171607
Size:		248.15KB
Weight path:	./
Runtime:	dlipy3
Kernel path:	kernel.py
Creator:	wangw6
Create time:	Wed May  4 17:16:08 UTC 2022
Update time:	Wed May  4 17:16:08 UTC 2022
REST URI:	https://wmla-inference-cpd-wmla.apps.cpd.mskcc.org/dlim/v1/inference/cifar-model-wendy
Attributes:	No attribute defined
Environments:	No environment variable defined
Schema version:	1


In [16]:
!$dlim model view $model_name -s $auth

Name:             cifar-model-wendy
State:            Started
Serving replica:  1
Serving service ID:   8c077629-5355-41de-80a6-f55239f540ad
Service JobID:        edi-cifar-model-wendy-c46d8f7c9-w4w2x
GPU Mode:             exclusive
Served clients:       0
Pending requests:     0
Requests per second:  0.00
Data per second:      0.00
Kernel started:       0


### Test Deployment

In [17]:
from PIL import Image
from io import BytesIO
import base64

img = Image.open('camion_s_000148.png')

buffer = BytesIO()
img.save(buffer, 'PNG')

img_bytes = base64.b64encode(buffer.getvalue()).decode('utf-8')

In [18]:
img_bytes

'iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAIAAAD8GO2jAAAKGklEQVR4nAXBWXNcZ0IA0G+7e/e9va9qqSVZUmTLWxYTJyGhamDMLNRADVDwwh+gigd+Do8UD/BA1cxUUinGkCGTwfG44nG8yIusfe1Wb7fvfr+Vc+C//PL+6evHo4NXQpDm4juLq5vl1qJpkZ3tB0e7z1gYYUHcskdM+87Hn15ZfyebT7dfPJGSUpa93H4e+OOc5ozi6SSJkoyLvF6vlCsFoULOQJYqEsym1VJF1ZuKuO3FFSEZkolMeDabqDTr1hqLvSu9K0ud7kKj0dQ0g5fs3kKLc5plqT+LxuMp0U0AcblqmE46D2aGSaTiGjGCuU9zRQBjNGdJQvvr3SiOKcsqNY9oaG1t/aMP3+82FzyvzoiwTYMoADlP4yhnzLbscqmxunL11as3ALI8Tzy3rOlgHgwVoFKq2SxOk1wpQHiWQi4M3ZqPx9XWwuK1K41eR9N0wBnj2euLSbI/Yoi+ef70g82rn975QCkVBPPjo3NdM3XdrdW7xydvddOO0jgIxkSDrmunaSI44Fwahk7yJC5Yplupv3vzVm9lLeT8zf5JkCSR70/8ycVg5np1gPLP/+M/tb9Fn939RNNYq9UBauzPwj88eUY0wym6XCga+RiBer0iBJ1MxwjYhJBSySOGoTFcTK3CQZB+/7tH00l0dj7UMNSQzDnNMtquk8vBkWvooR/sHBy02zVNI+1eq9NrHQ9O3jw/abTrh8djwKSkUhBh6oZBtDQTrusSYhDbbl76fPfk5OX2C6QRkbM0jDGSaR74YRDG0eHpK8cqbqxuAE7/75v/XVpeXt9Yr1Y9wySeayA+j3OUJnnqh0JkpqVFQegWXcPElLIkSUipUts92bk4PLC1fB7PouASSumHkZ9mxNBqzYZV9Lr9mz0THzz9FkPKhBiNJ9evb15ZW+m164UPbz97fZxnZq5JCVyp+GBwrhuGV24AEKd

In [19]:
import requests

url = 'https://wmla-inference-cpd-wmla.apps.cpd.mskcc.org/dlim/v1/inference/cifar-model-wendy'
headers = {'Authorization': 'Bearer '  + os.environ['USER_ACCESS_TOKEN']}

res = requests.post(url=url, headers=headers, json={'img':img_bytes}, verify=False)
res



<Response [200]>

In [20]:
res.text

'{"pred_class": 0}'

## 4. Stop Deployment
You need to stop a deployment when
- you want to change configurable parameters for the existing deployment, or
- you want to delete this deployment

In [21]:
!$dlim model stop $model_name $auth -f

Stopping model "cifar-model-wendy", run "dlim model view cifar-model-wendy -s" to ensure stop.


## 5. Remove Deployment

In [24]:
!$dlim model undeploy $model_name $auth -f

Undeployed model "cifar-model-wendy", run "dlim model list" to ensure deletion.


In [25]:
!$dlim model list $auth

NAME                 REST URI
deepliif-wendy-test  -
