# Deploy the trained Model and invoke it using Exasol

In this part of the tutorial we will show how to deploy and invoke the trained model using Exasol. For this we will show two versions.
You can either:


   * [Deploy the model using an AzureML online Endpoint and then invoke it via an Exasol UDF](#deploy-the-model-in-an-azureml-endpoint)

Or:
   * [Load the model into Exasols Filesystem (BucketFS), and then deploy and invoke it via an Exasol UDF](#deplay-and-invoke-the-model-in-exasol)

Which version you choose is up to you.


## Deploy the model in an AzureML endpoint

In this Section we will explain how to Deploy the model in an AzureML endpoint, and the invoke it via an Exasol UDF

### Prerequisites
 ## TODO
 * AzureML
 * Compute
 * trained and registered model
 * exasol with dataset

## Create online endpoint in azureml

First we need to set up an online endpoint with our trained and registered model in AzureML, so we can use it for real time inferencing. We will do this using Python in an AzureML Notebook. You can find the AzureML tutorial for this [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-online-endpoints?view=azureml-api-2&tabs=python)


Load this notebook into your AzureML Notebooks and start your Compute. Then we can install some dependencies if not already installed from previous steps.

In [None]:
!pip install azure-identity
!pip install azure-ai-ml==1.3.0

Import the required libraries

In [319]:
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration,
)
from azure.identity import DefaultAzureCredential

ModuleNotFoundError: No module named 'azure'

Now we can enter our Credentials to access our workspace.

In [None]:
credential = DefaultAzureCredential()
# Get a handle to the workspace
ml_client = MLClient(
    credential=credential,
    subscription_id="<your subscription id>",               # change
    resource_group_name="<your resource group name>",       # change
    workspace_name="<your workspace name>",                 # change
)

Now we are set up to start to set up our online endpoint we then want to deploy our model in. We set it up with key authentication, but you could use token authentication instead.

In [None]:
# Define an endpoint name
endpoint_name = "<your-endpoint-name>"                      # change

# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name = endpoint_name,
    description="<some description>",                       # change
    auth_mode="key"
)

Create the Endpoint.

In [None]:
ml_client.online_endpoints.begin_create_or_update(endpoint)

### Make a Deployment for the model

We now need to Deploy our model to the Endpoint via an AzureML Deployment. For this we need an Environment definition which can run our model. This need to include the "azureml-inference-server-http" package. There are also ready made images available from Microsoft, but here we make our own.
We take the conda file that is saved in our registered MLflow model and edit it to include the "azureml-inference-server-http" package. Then write it to a file.
#### (TODO explain how find?)

In [None]:
%%writefile ./conda.yml
channels:
- conda-forge
dependencies:
- python=3.8.16
- pip<=23.1.2
- pip:
  - azureml-inference-server-http
  - mlflow==1.26.1
  - cloudpickle==2.2.1
  - scikit-learn==1.0.2
name: endpoint-env

Now we get a handle to our registered model and create the Environment for our Deployment.

In [None]:
from azure.ai.ml.constants import AssetTypes

model = ml_client.models.get(name="<your registered model name>", version="<version of your model you want to use>")  # change

env = Environment(
    name="<name the Environment>",                                      # change
    description="Custom environment for azureML tutorial endpoint",
    conda_file="./conda.yml",                                           # change if necessary, path to your conda.yaml fil we created earlier
    image="mcr.microsoft.com/azureml/minimal-ubuntu20.04-py38-cpu-inference:latest", # base image from microsoft we use
)

With this we now can create our Deployment.
The Deployment uses a [scoring script](score.py). This script has an "init()" function which loads the model, and a "run" function which takes the input data and feeds it to the model. This function returns the classification results. Make sure you have the scoring script in your AzureML files.

In [320]:

cc = CodeConfiguration(code=".", scoring_script="score.py")     # change if necessary to point to your scoring script in AzureML
model_deployment = ManagedOnlineDeployment(
    name="<name your deployment>",          # change
    endpoint_name=endpoint_name,
    model=model,
    environment=env,
    code_configuration=cc,
    instance_type="Standard_DS1_v2",        # Type of Azure Instance. You can change this if you need more computing power
    instance_count=1,
)

NameError: name 'CodeConfiguration' is not defined

Now we can create the Deployment on our endpoint.

In [None]:
ml_client.online_deployments.begin_create_or_update(model_deployment)

You can check the status of your Deployment by geting the logs with this comand. Alternativly you can also Navigate to your Deployment in AzureML in your Browser and chack status and logs there.
### todo screenshots?

In [None]:
ml_client.online_deployments.get_logs(
    name="<your deployment name>", endpoint_name=endpoint_name, lines=50
)

In order to access this Endpoint we will need a key (or a token if you choose to go with token authentication above). Get the key like this:

In [None]:
endpoint_key = ml_client.online_endpoints.get_keys(name=endpoint_name)
print(endpoint_key.primary_key)

### Invoke the Endpoint from Exasol

In order to invoke our deployed Endpoint, we first need access to our Exasol Saas Database. We will use pyExasol for this.
### todo link

In [7]:
!pip install pyexasol
import pyexasol
EXASOL_HOST = "<your>.clusters.exasol.com"      # change
EXASOL_PORT = "8563"                            # change if needed
EXASOL_USER = "<your-exasol-user>"              # change
EXASOL_PASSWORD = "exa_pat_<your_password>"     # change
EXASOL_SCHEMA = "IDA"                           # change if needed

EXASOL_CONNECTION = "{host}:{port}".format(host=EXASOL_HOST, port=EXASOL_PORT)

You should consider upgrading via the '/home/marlene/PycharmProjects/data-science-examples/venv/bin/python -m pip install --upgrade pip' command.[0m


In [252]:
exasol = pyexasol.connect(dsn=EXASOL_CONNECTION, user=EXASOL_USER, password=EXASOL_PASSWORD, compression=True)

In [315]:
sql_open_schema = """OPEN SCHEMA "IDA";"""

sql_create_invoke_endpoint ="""
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT IDA.invoke_endpoint(...)
EMITS ("ID" DECIMAL(20,0), "result" VARCHAR(500), "df" VARCHAR(5000)) AS

import urllib.request
import requests
import ujson
import pandas as pd
import numpy as np
import time

def run(ctx):
    endpoint_key = "dVXtOJvU5ah4OGc4ygzuzyXoSfwVvyvS"
    deployment_name = "blue"
    headers = {'Content-Type':'application/json',
                'Authorization': f'Bearer {endpoint_key}',
               'azureml-model-deployment': f'{deployment_name}'}
    scoring_url = "https://azureml-tut-endpoint.westeurope.inference.ml.azure.com/score"

    try:
        df = ctx.get_dataframe(10)
    except:
        return 0

    if df is None:
        return 1
    id_column = df["0"]
    df = df.drop("0", 1)

    time.sleep(0.07)
    df = df.apply(pd.to_numeric, downcast='float', errors='ignore')
    df_str = str((df.values).tolist())
    df_str = df_str.replace("nan", 'null')
    if len(df_str) < 20:
        return 0
    PARAMS = '{"data": ' + df_str + '}'

    try:
        result = requests.post(url=scoring_url, data=PARAMS, headers=headers)
    except:
        return 0
    ctx.emit(*id_column, result.text,  df_str)
    return 0
/
"""
exasol.execute(sql_open_schema)
exasol.execute(sql_create_invoke_endpoint)


<ExaStatement session_id=1777553926653673474 stmt_idx=86>

In [174]:
column_names = ['AA_000', 'AG_005', 'AH_000', 'AL_000', 'AM_0', 'AN_000', 'AO_000', 'AP_000', 'AQ_000',
                    'AZ_004', 'BA_002', 'BB_000', 'BC_000', 'BD_000', 'BE_000',
                    'BF_000', 'BG_000', 'BH_000', 'BI_000', 'BJ_000', 'BS_000', 'BT_000', 'BU_000', 'BV_000',
                    'BX_000', 'BY_000', 'BZ_000', 'CA_000', 'CB_000', 'CC_000', 'CI_000', 'CN_004', 'CQ_000',
                    'CS_001', 'DD_000', 'DE_000', 'DN_000', 'DS_000', 'DU_000', 'DV_000', 'EB_000', 'EE_005'] #todo remove class pos, also from endpoint score script

In [316]:
res = exasol.export_to_pandas("""SELECT "CLASS", "result", "df" FROM  (
                           SELECT IDA.invoke_endpoint(ROWID, {columns!q}) FROM IDA.TEST t) r
                           JOIN IDA.TEST o ON r.ID = o.ROWID""", {"columns": column_names})

In [317]:
res

Unnamed: 0,CLASS,result,df
0,neg,"{""result"": ""0""}","[[32.0, 5752.0, 3490.0, 0.0, 0.0, 17204.0, 147..."
1,neg,"{""result"": ""1""}","[[601272.0, 32486.0, 17128568.0, 335184.0, 482..."
2,neg,"{""result"": ""0""}","[[16.0, 5406.0, 3032.0, 0.0, 0.0, 11154.0, 674..."
3,neg,"{""result"": ""0""}","[[45416.0, 2297994.0, 1786606.0, 0.0, 0.0, 472..."
4,neg,"{""result"": ""0""}","[[62714.0, 265216.0, 1826092.0, 0.0, 0.0, 3971..."
...,...,...,...
15828,neg,"{""result"": ""0""}","[[7460.0, 339610.0, 306840.0, 0.0, 0.0, 534406..."
15829,neg,"{""result"": ""0""}","[[40490.0, 1176332.0, 979148.0, 0.0, 0.0, 2050..."
15830,pos,"{""result"": ""1""}","[[62724.0, 4043362.0, 5003048.0, 373276.0, 486..."
15831,neg,"{""result"": ""0""}","[[30744.0, 245024.0, 1074244.0, 0.0, 0.0, 1927..."


In [318]:
import pandas as pd
# print( res.loc[res["prediction"] == 0])
pd.crosstab(index=res['CLASS'], columns=res["result"], rownames=['actuals'], colnames=['predictions'])

predictions,\nPlease check this guide to understand why this error code might have been returned \nhttps://docs.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-online-endpoints#http-status-codes\n,upstream connect error or disconnect/reset before headers. reset reason: overflow\nPlease check this guide to understand why this error code might have been returned \nhttps://docs.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-online-endpoints#http-status-codes\n,"{""result"": ""0""}","{""result"": ""1""}"
actuals,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
neg,1655,4,13106,697
pos,29,0,12,330


In [None]:
ml_client.online_endpoints.begin_delete(name=endpoint_name)

In [None]:
exasol.close()

## Deploy and invoke the model in Exasol

intoduce

### Prerequisites

### TODO

In [None]:
!pip install pyexasol
import pyexasol
import pandas as pd

EXASOL_HOST = "<your>.clusters.exasol.com"      # change
EXASOL_PORT = "8563"                            # change if needed
EXASOL_USER = "<your-exasol-user>"              # change
EXASOL_PASSWORD = "exa_pat_<your_password>"     # change
EXASOL_SCHEMA = "IDA"                           # change if needed

# get the connection
EXASOL_CONNECTION = "{host}:{port}".format(host=EXASOL_HOST, port=EXASOL_PORT)
exasol = pyexasol.connect(dsn=EXASOL_CONNECTION, user=EXASOL_USER, password=EXASOL_PASSWORD, compression=True)

In [26]:
exasol.export_to_pandas("SELECT * FROM IDA.TRAIN LIMIT 4")

Unnamed: 0,CLASS,AA_000,AB_000,AC_000,AD_000,AE_000,AF_000,AG_000,AG_001,AG_002,...,EE_002,EE_003,EE_004,EE_005,EE_006,EE_007,EE_008,EE_009,EF_000,EG_000
0,neg,42434,,2130706000.0,,0.0,0.0,0,0,0,...,339654,149990,376978,289532,215998,144972,250588,1408,0.0,0.0
1,neg,852,,80.0,66.0,0.0,0.0,0,0,2992,...,4142,2110,9332,22310,936,222,0,0,0.0,0.0
2,neg,436,0.0,66.0,60.0,0.0,0.0,0,0,0,...,0,0,0,0,0,0,0,0,0.0,0.0
3,neg,21386,,,,,,0,0,0,...,229972,91158,95770,55132,70446,397896,1106,180,,


In [None]:
! curl -X PUT -T sklearn_model.tgz https://w:writepw@192.168.6.75:1234/bucket1/my_file.tgz

In [None]:
! curl -X PUT -T tar1.tgz https://w:writepw@192.168.6.75:1234/bucket1/my_file.tgz

Then, move model from AzureML to Exasol BucketFS.
open saas, start db, clck manage udf files, click new de, upload file (only works in enterprise edition)
copy path of file

## call and invoke using udf

In [2]:
sql_open_schema = """OPEN SCHEMA "IDA";"""
sql_ls = """ --/
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT "IDA.LS" ("my_path" VARCHAR(100)) EMITS ("FILES" VARCHAR(100)) AS
import os
def run(ctx):
    for line in os.listdir(ctx.my_path):
        ctx.emit(line)
/
"""

sgl_run_ls = """SELECT "IDA.LS"('/buckets/uploads/default/testfolder');"""

exasol.execute(sql_open_schema)

<ExaStatement session_id=1774377255978401792 stmt_idx=1>

In [15]:
exasol.execute(sql_ls)

<ExaStatement session_id=1774293578104963073 stmt_idx=16>

In [3]:
exasol.export_to_pandas(sgl_run_ls)

Unnamed: 0,FILES
0,model.pkl


In [95]:
sql_create_inferece_udf = """
--/
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT IDA.use_model_for_inference(...)
EMITS ("ID" DECIMAL(20,0), "prediction" DOUBLE) AS

import urllib.request
import lxml.etree as etree
import sklearn
import numpy
import pickle
import pandas as pd

def load_model():
    model_path = "/buckets/uploads/default/testfolder/model.pkl"  # change to your model file path
    # deserialize the model file back into a sklearn model
    model = pickle.load(open(model_path, 'rb'))
    return model

def infer(data, model):

    data = numpy.array(data)
    result = model.predict(data)
    return result


def run(ctx):
    model = load_model()
    df = ctx.get_dataframe(num_rows=100)

    id_column = df["0"]
    df = df.drop("0", 1)
    response = infer(df, model)
    result = pd.DataFrame(response)
    ctx.emit(pd.concat([id_column,result],axis=1))

/
"""
exasol.execute(sql_create_inferece_udf)

ExaCommunicationError: 
(
    message     =>  [Errno 32] Broken pipe
    dsn         =>  6ki32oqp7zdtdldmii5e256npi.clusters.exasol.com:8563
    user        =>  integration-team
    schema      =>  
    session_id  =>  1774556523641896962
)


In [39]:
column_names = ['AA_000', 'AG_005', 'AH_000', 'AL_000', 'AM_0', 'AN_000', 'AO_000', 'AP_000', 'AQ_000',
                    'AZ_004', 'BA_002', 'BB_000', 'BC_000', 'BD_000', 'BE_000',
                    'BF_000', 'BG_000', 'BH_000', 'BI_000', 'BJ_000', 'BS_000', 'BT_000', 'BU_000', 'BV_000',
                    'BX_000', 'BY_000', 'BZ_000', 'CA_000', 'CB_000', 'CC_000', 'CI_000', 'CN_004', 'CQ_000',
                    'CS_001', 'DD_000', 'DE_000', 'DN_000', 'DS_000', 'DU_000', 'DV_000', 'EB_000', 'EE_005']

In [94]:
res = exasol.export_to_pandas("""SELECT "CLASS", "prediction" FROM  (
                           SELECT IDA.use_model_for_inference(ROWID, {columns_without_class!q}) FROM IDA.TEST t) r
                           JOIN IDA.TEST o ON r.ID = o.ROWID""", {"columns_without_class": column_names})

ExaQueryError: 
(
    message     =>  VM error: F-UDF-CL-LIB-1127: F-UDF-CL-SL-PYTHON-1002: F-UDF-CL-SL-PYTHON-1026: ExaUDFError: F-UDF-CL-SL-PYTHON-1114: Exception during run 
USE_MODEL_FOR_INFERENCE:23 run
RuntimeError: E-UDF-CL-SL-PYTHON-1105: get_dataframe() parameter 'start_col' is 6000, but there are only 43 input columns
 (Session: 1774556523641896962)
    dsn         =>  6ki32oqp7zdtdldmii5e256npi.clusters.exasol.com:8563
    user        =>  integration-team
    schema      =>  
    session_id  =>  1774556523641896962
    code        =>  22002
    query       =>  EXPORT (
SELECT "CLASS", "prediction" FROM  (
                           SELECT IDA.use_model_for_inference(ROWID, "AA_000", "AG_005", "AH_000", "AL_000", "AM_0", "AN_000", "AO_000", "AP_000", "AQ_000", "AZ_004", "BA_002", "BB_000", "BC_000", "BD_000", "BE_000", "BF_000", "BG_000", "BH_000", "BI_000", "BJ_000", "BS_000", "BT_000", "BU_000", "BV_000", "BX_000", "BY_000", "BZ_000", "CA_000", "CB_000", "CC_000", "CI_000", "CN_004", "CQ_000", "CS_001", "DD_000", "DE_000", "DN_000", "DS_000", "DU_000", "DV_000", "EB_000", "EE_005") FROM IDA.TEST t) r
                           JOIN IDA.TEST o ON r.ID = o.ROWID
) INTO CSV
AT 'https://172.16.59.37:44821' FILE '000.gz'
WITH COLUMN NAMES
)


In [92]:
import pandas as pd
# print( res.loc[res["prediction"] == 0])
pd.crosstab(index=res['CLASS'], columns=res["prediction"], rownames=['actuals'], colnames=['predictions'])

predictions,0,1
actuals,Unnamed: 1_level_1,Unnamed: 2_level_1
neg,14841,784
pos,13,362


delete udfs again