# Deploy the trained Model and invoke it using Exasol

In this part of the tutorial we will show how to deploy and invoke the trained model using Exasol. For this we will show two versions.
You can either:


   * [Deploy the model using an AzureML online Endpoint and then invoke it via an Exasol UDF](#deploy-the-model-in-an-azureml-endpoint)

Or:
   * [Load the model into Exasols Filesystem (BucketFS), and then deploy and invoke it via an Exasol UDF](#deplay-and-invoke-the-model-in-exasol)

Which version you choose is up to you.


## Deploy the model in an AzureML endpoint

In this Section we will explain how to Deploy the model in an AzureML endpoint, and then invoke it via an Exasol UDF.

### Prerequisites

You completed the [previous part of this tutorial series](TrainModelInAzureML.ipynb) and therefore have:
 * A running AzureML compute instance
 * A SciKit-learn MLflow model trained on the data registered in AzureML
 * The [Scania Trucks](https://archive.ics.uci.edu/ml/datasets/IDA2016Challenge) dataset loaded into an Exasol Saas Database

### Create an Online Endpoint in AzureML

First we need to set up an Online Endpoint with our trained and registered model in AzureML, so we can use it for real time inferencing. We will do this using Python in an AzureML Notebook. You can find the AzureML tutorial for this [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-online-endpoints?view=azureml-api-2&tabs=python).


Load this notebook into your AzureML Notebooks and start your Compute. Then we can install some dependencies if not already installed from previous steps.

In [None]:
!pip install azure-identity
!pip install azure-ai-ml==1.3.0

Import the required Python3.8 libraries.

In [319]:
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration,
)
from azure.identity import DefaultAzureCredential

Next we can enter the Credentials to access our workspace.

In [None]:
credential = DefaultAzureCredential()
# Get a handle to the workspace
ml_client = MLClient(
    credential=credential,
    subscription_id="<your subscription id>",               # change
    resource_group_name="<your resource group name>",       # change
    workspace_name="<your workspace name>",                 # change
)

Now we are ready to start to set up our Online Endpoint we want to deploy our model in. We use key authentication for the Online Endpoint, but you could use token authentication instead.

In [None]:
# Define an endpoint name
endpoint_name = "<your-endpoint-name>"                      # change

# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name = endpoint_name,
    description="<some description>",                       # change
    auth_mode="key"
)

Create the Endpoint. This can take a few minutes.

In [None]:
ml_client.online_endpoints.begin_create_or_update(endpoint)

### Make a Deployment for the model

We now need to Deploy our model to the Endpoint via an AzureML Deployment. For this we need an Environment definition which can run our model. This needs to include the "azureml-inference-server-http" package. There are also ready-made images available from Microsoft, but here we make our own.
We take the conda file that is saved in our registered MLflow model and edit it to include the "azureml-inference-server-http" package. Then write it to a file.

You can find your registered Model and its artifacts in AzureML Studio. Select "Models" in the menu on the left and click on the model you want to use. From there select the "Artifacts" tab to find the model files which include the conda.yaml.

![](img_src/registered_model.png)

![](img_src/conda_file_artifact.png)

In [None]:
%%writefile ./conda.yml
channels:
- conda-forge
dependencies:
- python=3.8.16
- pip<=23.1.2
- pip:
  - azureml-inference-server-http
  - mlflow==1.26.1
  - cloudpickle==2.2.1
  - scikit-learn==1.0.2
name: endpoint-env

Now we get a handle to our registered model and create the Environment for our Deployment using our new conda file.

In [None]:
from azure.ai.ml.constants import AssetTypes

model = ml_client.models.get(name="<your registered model name>", version="<version of your model you want to use>")  # change

env = Environment(
    name="<name the Environment>",                                      # change
    description="Custom environment for azureML tutorial endpoint",
    conda_file="./conda.yml",                                           # change if necessary, path to your conda.yaml file we created earlier
    image="mcr.microsoft.com/azureml/minimal-ubuntu20.04-py38-cpu-inference:latest", # base image from microsoft we use
)

With this we can create our Deployment.
The Deployment uses a [scoring script](score.py). This script has an "init()" function which loads the model, and a "run" function which takes the input data and feeds it to the model. This function returns the classification results. Make sure you have the scoring script in your AzureML files.

In [320]:

cc = CodeConfiguration(code=".", scoring_script="score.py")     # change if necessary to point to your scoring script in AzureML
model_deployment = ManagedOnlineDeployment(
    name="<name your deployment>",          # change
    endpoint_name=endpoint_name,
    model=model,
    environment=env,
    code_configuration=cc,
    instance_type="Standard_DS1_v2",        # Type of Azure Instance. You can change this if you need more computing power
    instance_count=1,
)

Now we can create the Deployment on our endpoint. This can also take some minutes.

In [None]:
ml_client.online_deployments.begin_create_or_update(model_deployment)

You can check the status of your Deployment by getting the logs with this command. Azure Studio also sends you a Notification when the Deployment is done. If you want a more detailed look you can also Navigate to your Deployment in AzureML and check status and logs there. To do this go to the menu on the left, select "Endpoints".

In [None]:
ml_client.online_deployments.get_logs(
    name="<your deployment name>", endpoint_name=endpoint_name, lines=50
)

In order to call this Endpoint we will need a key (or a token if you choose to go with token authentication). Here is how to access the key. You will need it in the UDF below.

In [None]:
endpoint_key = ml_client.online_endpoints.get_keys(name=endpoint_name)
print(endpoint_key.primary_key)

### Invoke the Endpoint from Exasol

In order to invoke our deployed Endpoint, we first need access to our Exasol Saas Database. We will use the [PyExasol](https://docs.exasol.com/db/latest/connect_exasol/drivers/python/pyexasol.htm) package for this. Install and import it. Then enter your connection info. It can happen that the connection info you used previously is not valid anymore. In that case you can find a more detailed explanation on how to allow a connection in Exasol Saas in [the Introduction to this tutorial series](Introduction.ipynb). Make sure the ip you are attempting to connect from is whitelisted in Exasol Saas and the authentication token you use as a password is still valid. Generate a new one if necessary.

In [None]:
!pip install pyexasol
import pyexasol

In [7]:
EXASOL_HOST = "<your>.clusters.exasol.com"      # change
EXASOL_PORT = "8563"                            # change if needed
EXASOL_USER = "<your-exasol-user>"              # change
EXASOL_PASSWORD = "exa_pat_<your_password>"     # change
EXASOL_SCHEMA = "IDA"                           # change if needed

EXASOL_CONNECTION = f"{EXASOL_HOST}:{EXASOL_PORT}"

In [252]:
exasol = pyexasol.connect(dsn=EXASOL_CONNECTION, user=EXASOL_USER, password=EXASOL_PASSWORD, compression=True)

We will now use a Python UserDefinedFunction (UDF) to call the AzureML Online Endpoint we just created and use it to classify the test data we loaded into Exasol Saas in the [first part of this tutorial series](ConnectAzureMLtoExasol.ipynb). You can find the documentation for Exasol UDFs [here](https://docs.exasol.com/saas/database_concepts/udf_scripts.htm).
Take care to only use the [supported](https://docs.exasol.com/saas/database_concepts/udf_scripts/python3.htm) packages, or build your own [Script-Language-Container](https://github.com/exasol/script-languages-container-tool). The UDF takes the data in the provided table and sends it as a REST request to the Endpoint. The UDF will then output the returned result.

You will need to change the information for accessing you AzureML Online Endpoint. Set the name of the Deployment we created previously, and the Endpont key we got earlier. This key will change each time you recreate the Endpoint, so take care to update it.

You will also need the URL for your deployment. You can find it by Navigating to your Endpoint in AzureML Studio and selecting the "Consume" tab. Here you can also find code snippets that show you how you can access the Endpoint.

![](img_src/consume_endpoint.png)

Below yu can see the UDF we use to access our endpoint. First we need to create the UDF in the Exasol Database using our PyExasol connection.

In [194]:
sql_open_schema = """OPEN SCHEMA "IDA";"""

sql_create_invoke_endpoint ="""
CREATE OR REPLACE PYTHON3 SET SCRIPT IDA.invoke_endpoint(...)
EMITS ("ID" DECIMAL(20,0), "result" VARCHAR(200)) AS

import requests
import ujson
import pandas as pd
import numpy as np
import time
import re


def run(ctx):
    # set up needed info to send the request
    endpoint_key = "<your endpoint key>"                                # change
    deployment_name = "<your deployment name>"                          # change
    headers = {'Content-Type':'application/json',
                'Authorization': f'Bearer {endpoint_key}',
               'azureml-model-deployment': f'{deployment_name}'}
    scoring_url = "<scoring url>"                                       # change


    while True:
        # get the data
        try:
            df = ctx.get_dataframe(500)
        except:
            return 0
        if df is None:
            break

        # remove the label from the test data
        id_column = df["0"]
        df = df.drop("0", 1)

        # cast data to floats and switch "nan" to "null" so it will encode and decode as a valid json
        df = df.apply(pd.to_numeric, downcast='float', errors='ignore')
        df_str = str((df.values).tolist())
        df_str = df_str.replace("nan", 'null')

        PARAMS = '{"data": ' + df_str + '}'

        # send the request
        try:
            result = requests.post(url=scoring_url, data=PARAMS, headers=headers)
        except:
            return 0

        str_l = re.findall('[0,1]', result.text)
        df_res = pd.DataFrame(str_l)
        ctx.emit(pd.concat([id_column, df_res], axis=1))
/
"""

exasol.execute(sql_open_schema)
exasol.execute(sql_create_invoke_endpoint)


Now we need to select the columns we want to send to the Endpoint. These should be the same we used to train the endpoint in the [training part of the tutorial](TrainModelInAzureML.ipynb).

In [195]:
column_names = ['AA_000', 'AG_005', 'AH_000', 'AL_000', 'AM_0', 'AN_000', 'AO_000', 'AP_000', 'AQ_000',
                    'AZ_004', 'BA_002', 'BB_000', 'BC_000', 'BD_000', 'BE_000',
                    'BF_000', 'BG_000', 'BH_000', 'BI_000', 'BJ_000', 'BS_000', 'BT_000', 'BU_000', 'BV_000',
                    'BX_000', 'BY_000', 'BZ_000', 'CA_000', 'CB_000', 'CC_000', 'CI_000', 'CN_004', 'CQ_000',
                    'CS_001', 'DD_000', 'DE_000', 'DN_000', 'DS_000', 'DU_000', 'DV_000', 'EB_000', 'EE_005']

That is all we need to call our UDF. It will call the Endpoint from within Exasol and then return the classification results to us. The call also adds a ROWID to be able to sort the result to the right input. This way we can then check how many where correctly classified by our endpoint.


In [196]:
res = exasol.export_to_pandas("""SELECT "CLASS", "result" FROM  (
                           SELECT IDA.invoke_endpoint(ROWID, {columns!q}) FROM IDA.TEST t) r
                           JOIN IDA.TEST o ON r.ID = o.ROWID""", {"columns": column_names})

And now we can create a confusion matrix from our results.

In [197]:
import pandas as pd
pd.crosstab(index=res['CLASS'], columns=res["result"], rownames=['actuals'], colnames=['predictions'])

predictions,0,1
actuals,Unnamed: 1_level_1,Unnamed: 2_level_1
neg,14841,784
pos,13,362


Don’t forget to Delete the endpoint once you don’t need it anymore, and also close your pyexasol connection.

In [None]:
ml_client.online_endpoints.begin_delete(name=endpoint_name)

In [None]:
exasol.close()

## Deploy and invoke the model in Exasol

Let's look into how to deploy and invoke the model directly on the Exasol Saas Cluster.

### Prerequisites

You completed the [previous part of this tutorial series](TrainModelInAzureML.ipynb) and therefore have:
 * A SciKit-learn MLflow model trained on the data registered in AzureML
 * Exasol Database Enterprise Edition(Needed for file System Access)
 * The [Scania Trucks](https://archive.ics.uci.edu/ml/datasets/IDA2016Challenge) dataset loaded into an Exasol Saas Database

First we need to download the model from AzureML and upload it into an Exasol Bucket in the BucketFS (Exasol's built-in Filesystem).
For this, navigate to your registered model by clicking on the "Models" entry in the menu on the left in AzureML Studio. Select the model you want to use. From here you can either download all model files by clicking "Download all", or you can go to the "Artifacts" tab and only download the "model.pkl" file we will need for this tutorial. The use of the MLflow package is not currently supported by Exasol, so we do not need the other files created by MLflow.
![](img_src/download_all.png)
![](img_src/download_file_arifact.png)


Once you downloaded the model, go to the online interface of your Exasol Saas Database. Click on the three dots on th right and then click "Manage UDF files". This step only works in the Enterprise edition of Exasol Saas.
![](img_src/manage_udf_files.png)

This leads you to an interface where you can upload files to the internal file system of Exasol Saas, also called the BucketFS. Click on the "Upload files" button and upload your model.pkl file. It will then show you the path where your fle is saved at. Remember that path, as we will need it to access the file from within the UDF. (It might be that the path does not work. in that case try replacing the "bucketfs" in the front of the path with "buckets" instead)
![](img_src/file_path_bucketfs.png)


Then we will use the [PyExasol](https://docs.exasol.com/db/latest/connect_exasol/drivers/python/pyexasol.htm) package to connect to our Exasol Saas Cluster. Install and import it. Then enter your connection info. It can happen that the connection info you used previously is not valid anymore. In that case you can find a more detailed explanation on how to allow a connection in Exasol Saas in [the Introduction to this tutorial series](Introduction.ipynb). Make sure the ip you are attempting to connect from is whitelisted in Exasol Saas and the authentication token you use as a password is still valid. Generate a new one if necessary.

In [None]:
!pip install pyexasol
import pyexasol
import pandas as pd

EXASOL_HOST = "<your>.clusters.exasol.com"      # change
EXASOL_PORT = "8563"                            # change if needed
EXASOL_USER = "<your-exasol-user>"              # change
EXASOL_PASSWORD = "exa_pat_<your_password>"     # change
EXASOL_SCHEMA = "IDA"                           # change if needed

# get the connection
EXASOL_CONNECTION = f"{EXASOL_HOST}:{EXASOL_PORT}"

In [193]:
exasol = pyexasol.connect(dsn=EXASOL_CONNECTION, user=EXASOL_USER, password=EXASOL_PASSWORD, compression=True)

### Optional: Check model file access

We will now use a Python UserDefinedFunction (UDF) to check if we can successfully access the uploaded model file in a UDF. You can find the documentation for Exasol UDFs [here](https://docs.exasol.com/saas/database_concepts/udf_scripts.htm).
For this we will use a short "ls" UDF which takes a path as import and returns a table of filenames found at that path.
First we create the UDF in the Exasol Database using our PyExasol connection.

In [None]:
sql_open_schema = """OPEN SCHEMA "IDA";"""
sql_ls = """ --/
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT "IDA.LS" ("my_path" VARCHAR(100)) EMITS ("FILES" VARCHAR(100)) AS
import os
def run(ctx):
    for line in os.listdir(ctx.my_path):
        ctx.emit(line)
/
"""

exasol.execute(sql_open_schema)
exasol.execute(sql_ls)

And then we can call it using the path of our uploaded file as input. It should return a table containing the name of the uploaded model file. If t does not work remember to try replacing the "bucketfs" in the front of the path with "buckets" instead).

In [3]:
sgl_run_ls = """SELECT "IDA.LS"('/buckets/uploads/default/<your file>');"""     # change
exasol.export_to_pandas(sgl_run_ls)

Unnamed: 0,FILES
0,model.pkl


## Call the trained model using a UDF

We will now use another [Python UDF](https://docs.exasol.com/saas/database_concepts/udf_scripts.htm) to load the model and use it to classify our test data we loaded into Exasol Saas in the first part of [this tutorial part](ConnectAzureMLtoExasol.ipynb). Take care to only use the [supported](https://docs.exasol.com/saas/database_concepts/udf_scripts/python3.htm) packages, or build your own [Script-Language-Container](https://github.com/exasol/script-languages-container-tool). The UDF takes the data in the provided table and uses the model to classify it. The UDF will then output the result.

First we need to create the UDF using our PyExasol connection.


In [38]:
sql_create_inference_udf = """
--/
CREATE OR REPLACE PYTHON3 SET SCRIPT IDA.use_model_for_inference(...)
EMITS ("ID" DECIMAL(20,0), "prediction" DOUBLE) AS

import urllib.request
import lxml.etree as etree
import sklearn
import numpy
import pickle
import pandas as pd

def load_model():
    model_path = "/buckets/uploads/default/testfolder/model.pkl"  # change to your model file path
    # deserialize the model file back into a sklearn model
    model = pickle.load(open(model_path, 'rb'))
    return model

def infer(data, model):

    data = numpy.array(data)
    result = model.predict(data)
    return result


def run(ctx):
    model = load_model()
    while True:
        df = ctx.get_dataframe(num_rows=1000)
        if df is None:
            break
        id_column = df["0"]
        df = df.drop("0", 1)
        response = infer(df, model)
        result = pd.DataFrame(response)
        ctx.emit(pd.concat([id_column,result],axis=1))

/
"""
exasol.execute(sql_create_inference_udf)

Now we need to select the columns we want to send to the Endpoint. These should be the same we used to train the endpoint in the [training section of the tutorial series](TrainModelInAzureML.ipynb).

In [9]:
column_names = ['AA_000', 'AG_005', 'AH_000', 'AL_000', 'AM_0', 'AN_000', 'AO_000', 'AP_000', 'AQ_000',
                    'AZ_004', 'BA_002', 'BB_000', 'BC_000', 'BD_000', 'BE_000',
                    'BF_000', 'BG_000', 'BH_000', 'BI_000', 'BJ_000', 'BS_000', 'BT_000', 'BU_000', 'BV_000',
                    'BX_000', 'BY_000', 'BZ_000', 'CA_000', 'CB_000', 'CC_000', 'CI_000', 'CN_004', 'CQ_000',
                    'CS_001', 'DD_000', 'DE_000', 'DN_000', 'DS_000', 'DU_000', 'DV_000', 'EB_000', 'EE_005']

We can now call the UDF. It will predict on our test data directly in Exasol, using the model we trained in AzureML. The call also adds a ROWID to be able to sort the result to the right input. This way we can then check how many where correctly classified by our endpoint.

In [39]:
res = exasol.export_to_pandas("""SELECT "CLASS", "prediction" FROM  (
                           SELECT IDA.use_model_for_inference(ROWID, {columns_without_class!q}) FROM IDA.TEST t) r
                           JOIN IDA.TEST o ON r.ID = o.ROWID""", {"columns_without_class": column_names})

And now we can create a confusion matrix from our results.

In [25]:
import pandas as pd

pd.crosstab(index=res['CLASS'], columns=res["prediction"], rownames=['actuals'], colnames=['predictions'])

predictions,0,1
actuals,Unnamed: 1_level_1,Unnamed: 2_level_1
neg,14841,784
pos,13,362


Don't forget to close your connection.

In [None]:
exasol.close()