## Deploy the Saved Model in the project to Deployment Space

When this notebook is executed, we expect <br>**(1)** Serialized model tracked with DVC <br>**(2)** Knowledge of the models location within DVCFileSystem (Path within Git Repository for DVC tracking)

Steps covered in this notebook:
1. Retrieve parameters
2. Retrieve DVC-tracked model from COS
3. Prepare Watson Machine Learning environment for Model Deployment
4. Retrieve DVC-tracked trainin data reference from COS
5. Deploy Model
6. Model Testing on the Serving Endpoint

### The following cell is a way to get the utility script required for this notebook. 
Since IBM CPD SaaS doesn't have a filesystem, this is the only reliable way to get scripts on the cloud environment. 
```
!rm -rf MLOps-CPD && git clone --quiet -b master https://github.com/IBM/MLOps-CPD.git
```
⚠️ Run the following cells only if you are executing on IBM CPD SaaS.

In [None]:
#!rm -rf MLOps-CPD && git clone --quiet -b master https://github.com/IBM/MLOps-CPD.git

In [None]:
#!mv MLOps-CPD MLOps_CPD

In [None]:
!python3 -m pip install ibm_watson_machine_learning

In [2]:
from ibm_watson_studio_pipelines import WSPipelines
from ibm_watson_machine_learning import APIClient
import ibm_boto3

from botocore.client import Config
from sklearn.model_selection import train_test_split
from dataclasses import dataclass
import numpy as np
import pandas as pd

import pickle
import dvc.api
import io

import logging
import os, types
import warnings

warnings.filterwarnings("ignore")

## Succeeding cell contains the credentials for MLOps COS
```
## PROJECT COS 
AUTH_ENDPOINT = "https://iam.cloud.ibm.com/oidc/token"
ENDPOINT_URL = "https://s3.private.us.cloud-object-storage.appdomain.cloud"
API_KEY_COS = "xxx"
BUCKET_PROJECT_COS = "mlops-donotdelete-pr-qxxcecxi1dtw94"

## MLOPS COS
ENDPOINT_URL_MLOPS = "https://s3.jp-tok.cloud-object-storage.appdomain.cloud"
API_KEY_MLOPS = "xxx"
CRN_MLOPS = "xxx"
BUCKET_MLOPS  = "mlops-asset"

## CATALOG
CATALOG_NAME = "MLOps-ns"
```

### 1. Retrieve Parameters

In [3]:
# For testing: Uncomment this cell and put your credentials in credentials.py to run locally.
# from credentials import set_env_variables_for_credentials
# set_env_variables_for_credentials()

#### Pipeline Environment

In [4]:
CLOUD_API_KEY = os.getenv("CLOUD_API_KEY")

# Model parameters
MODEL_NAME = os.getenv("model_name")
DEPLOYMENT_NAME = os.getenv("deployment_name")
SPACE_ID = os.getenv("space_id") # Deployment Space Id to deploy to
# "ff681eb5-f5aa-4bf9-9c26-a7fbef89853f"
# model_id = os.getenv('model_id')

#### Cloud Object Storage (COS) Credentials

In [6]:
GIT_REPOSITORY = os.getenv("GIT_REPOSITORY")
model_dvc_location = os.getenv("model_dvc_location")
train_package_dvc_location = os.getenv("train_package_dvc_location") 
test_package_dvc_location = os.getenv("test_package_dvc_location")

In [7]:
# For testing
# train_package_dvc_location = "data/train_package.pkl"
# test_package_dvc_location = "data/test_package.pkl"

### 2. Retrieve DVC-tracked model from COS

In [None]:
def read_dvc_tracked_data_from_cos(dvc_path, repo, mode='rb'):
    return pickle.load(io.BytesIO(dvc.api.read(dvc_path,repo=repo, mode=mode)))

In [14]:
model = read_dvc_tracked_data_from_cos("model/xgbr.pkl", GIT_REPOSITORY)
model

### 3. Prepare Watson Machine Learning environment for Model Deployment

#### Instantiate WML Client

In [9]:
url_frankfurt = "https://eu-de.ml.cloud.ibm.com"
url_dallas = "https://us-south.ml.cloud.ibm.com"

In [10]:
WML_CREDENTIALS = {
                   "url": url_dallas,
                   "apikey": CLOUD_API_KEY
            }


wml_client = APIClient(WML_CREDENTIALS)
wml_client.version

'1.0.302'

In [12]:
wml_client.set.default_space(SPACE_ID)

'SUCCESS'

In [15]:
software_spec_uid = wml_client.software_specifications.get_id_by_name("runtime-22.2-py3.10")
software_spec_uid

'b56101f1-309d-549b-a849-eaa63f77b2fb'

In [16]:
#client.hardware_specifications.list() 
hardware_spec_uid = wml_client.hardware_specifications.get_id_by_name('S')
hardware_spec_uid

'e7ed1d6c-2e89-42d7-aed5-863b972c1d2b'

In [17]:
software_spec_uid

'b56101f1-309d-549b-a849-eaa63f77b2fb'

### 4. Retrieve DVC-tracked trainin data reference from COS

In [18]:
# Load dvc-tracked testing package from cos
train_package = read_dvc_tracked_data_from_cos("data/train_package.pkl", "GIT_REPOSITORY")
train_package

{'X_train':               time  latitude  longitude        stl1            tp     swvl1  \
 383418  2023-01-07     49.05      32.35  273.509262  1.308243e-03  0.381918   
 1672526 2023-01-29     41.35      31.95  278.198369  1.080334e-02  0.407033   
 4610241 2023-03-22     56.85      33.05  273.563887  6.858681e-04  0.396030   
 1444657 2023-01-25     38.45      29.85  273.363105  3.725290e-09  0.306040   
 6471636 2023-04-24     54.65      33.35  279.434854  1.430511e-06  0.279207   
 ...            ...       ...        ...         ...           ...       ...   
 2123762 2023-02-07     50.05      26.75  272.136679  7.457859e-04  0.358638   
 3030822 2023-02-23     64.55      34.35  271.310432  1.273321e-04  0.212345   
 6790267 2023-04-29     36.05      37.25  287.851026  6.773084e-04  0.142623   
 5948242 2023-04-15     55.15      27.55  279.759354  7.689683e-04  0.356013   
 2292072 2023-02-10     53.95      31.35  273.128058  7.418920e-05  0.346020   
 
         valid_time  
 3834

In [19]:
X_train = train_package['X_train']
y_train = train_package['y_train']

In [21]:
# Only submit a few training rows to save some resources and time
X = X_train.tail(100000)
y = y_train.tail(100000)

### 5. Deploy Model

In [22]:
model_name = "flood-regression_model"
deployment_name = "flood-regression_deployment"
model_type = "scikit-learn_1.1"
target = "dis24"

meta_props = {
            wml_client.repository.ModelMetaNames.NAME: model_name,
            wml_client.repository.ModelMetaNames.TYPE: model_type,
            wml_client.repository.ModelMetaNames.SOFTWARE_SPEC_UID: software_spec_uid,
            wml_client.repository.ModelMetaNames.LABEL_FIELD: target,
            # wml_client._models.ConfigurationMetaNames.TRAINING_DATA_REFERENCES: train_data_ref,
            wml_client.repository.ModelMetaNames.INPUT_DATA_SCHEMA: [
                {
                    "id": "input_data_schema",
                    "type": "list",
                    "fields": [
                        {"name": index, "type": value}
                        for index, value in X.dtypes.astype(str).items()
                    ],
                },
            ],
        }


In [23]:
model_details = wml_client.repository.store_model(
            model=model, meta_props=meta_props, training_data=X, training_target=y
)

In [24]:
model_uid = wml_client.repository.get_model_id(model_details)
model_uid

'cfe93062-e307-40b3-9e2f-fe73e30ea632'

In [25]:
meta_props = {
    wml_client.deployments.ConfigurationMetaNames.NAME: deployment_name,
    wml_client.deployments.ConfigurationMetaNames.ONLINE: {},
}
deployment_details = wml_client.deployments.create(
    model_uid, meta_props=meta_props
)



#######################################################################################

Synchronous deployment creation for uid: 'cfe93062-e307-40b3-9e2f-fe73e30ea632' started

#######################################################################################


initializing
Note: online_url is deprecated and will be removed in a future release. Use serving_urls instead.

ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='46401071-f986-4533-9f23-1596ea34e515'
------------------------------------------------------------------------------------------------




In [34]:
deployment_uid = wml_client.deployments.get_uid(deployment_details)
deployment_uid

'46401071-f986-4533-9f23-1596ea34e515'

### 6. Model Testing on the Serving Endpoint



#### Load Sample Data 

#### Load Test Data to Score against WML Endpoint

In [26]:
# Load dvc-tracked testing package from cos
test_package = read_dvc_tracked_data_from_cos(test_package_dvc_location)
test_package

{'X_test':               time  latitude  longitude        stl1        tp     swvl1  \
 196490  2023-01-04     57.45      35.55  274.346373  0.012709  0.383775   
 1695950 2023-01-30     64.15      33.55  271.719986  0.000356  0.635060   
 3171536 2023-02-25     48.35      30.55  273.968750  0.000391  0.360959   
 3228297 2023-02-26     49.25      37.05  272.438057  0.004216  0.458436   
 3629990 2023-03-05     52.35      32.75  272.573876  0.004056  0.383429   
 ...            ...       ...        ...         ...       ...       ...   
 704725  2023-01-13     66.85      30.25  272.277997  0.002061  0.210199   
 1001508 2023-01-18     62.65      38.15  271.255661  0.000497  0.217640   
 880876  2023-01-16     65.55      28.55  271.910438  0.005835  0.474234   
 6754564 2023-04-29     59.55      38.95  279.475731  0.009330  0.415607   
 4671010 2023-03-23     55.05      29.95  272.836868  0.000180  0.423059   
 
         valid_time  
 196490  2023-01-05  
 1695950 2023-01-31  
 3171536 2

In [29]:
# Take a few rows and score them against the deployed model/WML endpoint
a_few_rows = test_package['X_test'].head(5)
a_few_rows = a_few_rows.apply(pd.to_numeric, errors="coerce")
a_few_rows


Unnamed: 0,time,latitude,longitude,stl1,tp,swvl1,valid_time
196490,1672790400000000000,57.45,35.55,274.346373,0.012709,0.383775,1672876800000000000
1695950,1675036800000000000,64.15,33.55,271.719986,0.000356,0.63506,1675123200000000000
3171536,1677283200000000000,48.35,30.55,273.96875,0.000391,0.360959,1677369600000000000
3228297,1677369600000000000,49.25,37.05,272.438057,0.004216,0.458436,1677456000000000000
3629990,1677974400000000000,52.35,32.75,272.573876,0.004056,0.383429,1678060800000000000


#### Score the Endpoint

In [35]:
predictions = wml_client.deployments.score(deployment_uid, payload_scoring)
predictions

{'predictions': [{'fields': ['prediction'], 'values': [[9.992476463317871]]}]}

In [44]:
fields = list(test_package['X_test'].keys()) # feature cols

# For loop to score for each row in "a_few_rows"
for val in range(len(a_few_rows)):
    payload_scoring = {"input_data": [{"fields": fields, "values": [a_few_rows.iloc[val].tolist()]}]}
    predictions = wml_client.deployments.score(deployment_uid, payload_scoring)
    print(predictions)


{'predictions': [{'fields': ['prediction'], 'values': [[9.992476463317871]]}]}
{'predictions': [{'fields': ['prediction'], 'values': [[8.182958602905273]]}]}
{'predictions': [{'fields': ['prediction'], 'values': [[36.460670471191406]]}]}
{'predictions': [{'fields': ['prediction'], 'values': [[48.54684066772461]]}]}
{'predictions': [{'fields': ['prediction'], 'values': [[10.985206604003906]]}]}


## Save Params in WS Pipeline

In [None]:
deployment_done = {}
deployment_done['deployment_status'] = deploy_done
deployment_done['deployment_id'] = deployment_uid
deployment_done['model_id'] = model_uid
deployment_done['space_id'] = SPACE_ID

In [None]:
pipelines_client = WSPipelines.from_apikey(apikey=CLOUD_API_KEY)
pipelines_client.store_results(deployment_done)