# Use an AWS Sagemaker model from within Exasol

In this notebook we will use an AWS Sagemaker model for predicitions from within Exasol queries.

For that our exasol database needs permissions to use the Sagemaker inference Notebook.
For that you can:

* Provide credentials
* Grant the permissions to the Role of the databases EC2 role.

In this guide we will use the second approach.

Grant the following permissions to your EC2 instance role:

* `sts:AssumeRole` with a resource filter for the EC2 role itself.
* `sagemaker:InvokeEndpoint` with a resource filter on your Sagemaker endpoint.

In case you want to take the first approach, you can modify the UDF code below to use credentials.

## Parameters

In [18]:
EXASOL_HOST = "3.125.52.226" # change
EXASOL_PORT = "8563" # change if needed
EXASOL_CONNECTION = "{host}:{port}".format(host=EXASOL_HOST, port=EXASOL_PORT)
EXASOL_USER = "sys" # change if needed
EXASOL_PASSWORD = "Yd1ElI0kzU60FzMNIcY6" # change
EXASOL_SCHEMA = "IDA" # change if needed
EXASOL_CLUSTER_ROLE = "arn:aws:iam::922177738768:role/sagemaker-guide-exasol-EC2RoleDBNode-JZ0ZXWV5KAB1" #change
EXASOL_REGION = "eu-central-1" #change if needed
ENDPOINT_NAME = "sagemaker-xgboost-2020-12-08-13-14-17-829" #change

## Setup

In [2]:
!pip install pyexasol

import pyexasol
import pandas as pd
exasol = pyexasol.connect(dsn=EXASOL_CONNECTION, user=EXASOL_USER, password=EXASOL_PASSWORD, compression=True)



## Install UDF

In order to use the Sagemaker inference Endpoint from within the Exasol database, we will create a Python UDF that does API calls to the endpoint with the data from the query.

In [19]:
exasol.execute("""
CREATE OR REPLACE PYTHON3 SET SCRIPT JB.PREDICT(...) EMITS(id DECIMAL(20,0), "result" BOOLEAN) AS
def run(ctx):
    import boto3
    import pandas as pd
    import os
    f = open("/tmp/.config", "w")
    f.write(
        "[default]\\nregion = {region!r}\\nrole_arn = {role!r}\\ncredential_source = Ec2InstanceMetadata")
    f.close()
    os.environ['AWS_CONFIG_FILE'] = '/tmp/.config'
    while True:
        df = ctx.get_dataframe(1000)
        if df is None:
            break
        id_column = df["0"]
        df = df.drop("0", 1)
        client = boto3.client('sagemaker-runtime')
        endpoint_name = "{endpoint_name!r}"
        response = client.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType='text/csv',
            Body=df.to_csv(header=False, index=False)
        )
        result_list = response['Body'].read().decode('ascii').split(",")
        rounded_result = map(lambda x: bool(round(float(x))),result_list)
        result = pd.DataFrame(list(rounded_result))
        ctx.emit(pd.concat([id_column,result],axis=1))
/
""", {
        "region": EXASOL_REGION,
        "role": EXASOL_CLUSTER_ROLE,
        "endpoint_name": ENDPOINT_NAME
})

<ExaStatement session_id=1685797548868304896 stmt_idx=24>

## Run Query

So let's run predictions on the test data table in Exasol.

In [20]:
all_columns = exasol.export_to_pandas("SELECT * FROM " + EXASOL_SCHEMA + ".TEST LIMIT 1;")
column_names = list(all_columns)
column_names.remove("CLASS")
result = exasol.export_to_pandas("""SELECT CLASS = 'pos' as "expected", "result" FROM  (
                                     SELECT JB.PREDICT(ROWID, {columns_without_class!q}) FROM IDA.TEST t) r
                                    JOIN IDA.TEST o ON r.ID = o.ROWID""", {"columns_without_class": column_names})
pd.crosstab(index=result['expected'], columns=result["result"], rownames=['actuals'], colnames=['predictions'])

predictions,0,1
actuals,Unnamed: 1_level_1,Unnamed: 2_level_1
0,15597,28
1,99,276
