Here we will perform all the necessary steps to get the Transformer Extension functionality up and running. Please refer to the Transformer Extension <a href="https://github.com/exasol/transformers-extension/blob/main/doc/user_guide/user_guide.md" target="_blank" rel="noopener">User Guide</a> for details on the required initialization steps. Note the installation of the extension is included in the installation of this product, therefore the first step mentioned in the guide can be skipped.

We will be using a generic prediction UDF script. To execute queries and load data from Exasol database we will be using the <a href="https://github.com/exasol/pyexasol" target="_blank" rel="noopener">`pyexasol`</a> module.

Prior to using this notebook one needs to complete the follow steps:
1. [Create the database schema](../setup_db.ipynb).

In [2]:
# TODO: Move this to a separate configuration notebook. Here we just need to load this configuration from a store.
from dataclasses import dataclass

@dataclass
class SandboxConfig:
    EXTERNAL_HOST_NAME = "192.168.124.93"
    HOST_PORT = "8888"

    @property
    def EXTERNAL_HOST(self):
        return f"""{self.EXTERNAL_HOST_NAME}:{self.HOST_PORT}"""

    USER = "sys"
    PASSWORD = "exasol"
    BUCKETFS_PORT = "6666"
    BUCKETFS_USER = "w"
    BUCKETFS_PASSWORD = "write"
    BUCKETFS_USE_HTTPS = False
    BUCKETFS_SERVICE = "bfsdefault"
    BUCKETFS_BUCKET = "default"

    @property
    def EXTERNAL_BUCKETFS_HOST(self):
        return f"""{self.EXTERNAL_HOST_NAME}:{self.BUCKETFS_PORT}"""

    @property
    def BUCKETFS_URL_PREFIX(self):
        return "https://" if self.BUCKETFS_USE_HTTPS else "http://"

    @property
    def BUCKETFS_PATH(self):
        # Filesystem-Path to the read-only mounted BucketFS inside the running UDF Container
        return f"/buckets/{self.BUCKETFS_SERVICE}/{self.BUCKETFS_BUCKET}"

    SCRIPT_LANGUAGE_NAME = "PYTHON3_60"
    UDF_FLAVOR = "python3-ds-EXASOL-6.0.0"
    UDF_RELEASE= "20190116"
    UDF_CLIENT = "exaudfclient" # or for newer versions of the flavor exaudfclient_py3
    SCHEMA = "IDA"

    @property
    def SCRIPT_LANGUAGES(self):
        return f"""{self.SCRIPT_LANGUAGE_NAME}=localzmq+protobuf:///{self.BUCKETFS_SERVICE}/
            {self.BUCKETFS_BUCKET}/{self.UDF_FLAVOR}?lang=python#buckets/{self.BUCKETFS_SERVICE}/
            {self.BUCKETFS_BUCKET}/{self.UDF_FLAVOR}/exaudf/{self.UDF_CLIENT}""";

    @property
    def connection_params(self):
        return {"dns": self.EXTERNAL_HOST, "user": self.USER, "password": self.PASSWORD, "compression": True}

    @property
    def params(self):
        return {
            "script_languages": self.SCRIPT_LANGUAGES,
            "script_language_name": self.SCRIPT_LANGUAGE_NAME,
            "schema": self.SCHEMA,
            "BUCKETFS_PORT": self.BUCKETFS_PORT,
            "BUCKETFS_USER": self.BUCKETFS_USER,
            "BUCKETFS_PASSWORD": self.BUCKETFS_PASSWORD,
            "BUCKETFS_USE_HTTPS": self.BUCKETFS_USE_HTTPS,
            "BUCKETFS_BUCKET": self.BUCKETFS_BUCKET,
            "BUCKETFS_PATH": self.BUCKETFS_PATH
        }

    # Name of the BucketFS connection
    BFS_CONN = 'MyBFSConn'

    # Name of a sub-directory of the bucket root
    BFS_DIR = 'my_storage'

    # We will store all models in this sub-directory at BucketFS
    TE_MODELS_DIR = 'models'
    
    # We will save cached model in this sub-directory relative to the current directory on the local machine.
    TE_MODELS_CACHE_DIR = 'models_cache'

conf = SandboxConfig()

First, let's upload into the BucketFS and activate the required Script-Language-Container (SLC). This can be done by running the command below.

In [None]:
deploy_command = f"""
python -m exasol_transformers_extension.deploy language-container \
    --dsn {conf.EXTERNAL_HOST} \
    --db-user {conf.USER} \
    --db-pass {conf.PASSWORD} \
    --bucketfs-name {conf.BUCKETFS_SERVICE} \
    --bucketfs-host {conf.EXTERNAL_HOST_NAME} \
    --bucketfs-port {conf.BUCKETFS_PORT} \
    --bucketfs-user {conf.BUCKETFS_USER} \
    --bucketfs-password {conf.BUCKETFS_PASSWORD} \
    --bucketfs-use-https {conf.BUCKETFS_USE_HTTPS} \
    --bucket {conf.BUCKETFS_BUCKET} \
    --path-in-bucket . \
    --language-alias {conf.SCRIPT_LANGUAGE_NAME} \
    --version 0.5.0
"""

# !{deploy_command}
print(deploy_command)

Now we shall upload all scripts into the database.
Note, that the SLC must be uploaded first. 

In [None]:
deploy_command = f"""
python -m exasol_transformers_extension.deploy scripts \
    --dsn {conf.EXTERNAL_HOST} \
    --db-user {conf.USER} \
    --db-pass {conf.PASSWORD} \
    --schema {conf.SCHEMA} \
    --language-alias {conf.SCRIPT_LANGUAGE_NAME} \
    --no-use-ssl-cert-validation
"""
print(deploy_command)

!{deploy_command}

Let's create a connection to the BucketFS where we are going to store all our models. We will use this connection hereafter in the queries.

Notice that we specify a sub-directory of the bucket root, e.g. "my_storage" (the name can be chosen arbitrarily). The BucketFS will create this sub-directory for us the first time we use the connection.

In [4]:
import pyexasol

sql = f"""
CREATE OR REPLACE CONNECTION [{conf.BFS_CONN}]
    TO '{conf.BUCKETFS_URL_PREFIX}{conf.EXTERNAL_BUCKETFS_HOST}/{conf.BUCKETFS_BUCKET}/{conf.BFS_DIR};{conf.BUCKETFS_SERVICE}'
    USER {{BUCKETFS_USER!s}}
    IDENTIFIED BY {{BUCKETFS_PASSWORD!s}}
"""

with pyexasol.connect(dsn=conf.EXTERNAL_HOST, user=conf.USER, password=conf.PASSWORD, compression=True) as conn:
    conn.execute(query=sql, query_params=conf.params)

Some models require the [Sacremoses tokenizer](https://github.com/alvations/sacremoses) to be installed in the local environment when they get downloaded. Let's make sure we have it installed by running the command below.

In [None]:
!pip install sacremoses