# Transformer Extension initialization

Here we will perform all the necessary steps to get the Transformer Extension functionality up and running. Please refer to the Transformer Extension <a href="https://github.com/exasol/transformers-extension/blob/main/doc/user_guide/user_guide.md" target="_blank" rel="noopener">User Guide</a> for details on the required initialization steps. The extension module should have already been installed during the installation of this product, therefore the first step mentioned in the guide can be skipped.

To execute queries and load data from the Exasol database we will be using the <a href="https://github.com/exasol/pyexasol" target="_blank" rel="noopener">`pyexasol`</a> module.

## Prerequisites

Prior to using this notebook one needs to complete the following steps:
1. [Configure the sandbox](../sandbox_config.ipynb).

## Set up

### Access configuration

In [None]:
%run ../utils/access_store_ui.ipynb
display(get_access_store_ui('../'))

In [None]:
EXTERNAL_HOST = f"{sb_config.EXTERNAL_HOST_NAME}:{sb_config.HOST_PORT}"

We will add some new variables specific to the Transformer Extension.

In [None]:
%%capture

# Huggingface token required for downloading private models.
sb_config.save('TE_TOKEN', '-')

# Name of the connection encapsulating the Huggingface token. Leave it empty if the token is not used.
sb_config.save('TE_TOKEN_CONN', '')

# Name of the BucketFS connection.
sb_config.save('TE_BFS_CONN', 'MyBFSConn')

# Name of a sub-directory of the bucket root.
sb_config.save('TE_BFS_DIR', 'my_storage')

# We will store all models in this sub-directory at BucketFS.
sb_config.save('TE_MODELS_BFS_DIR', 'models')
    
# We will save a cached model in this sub-directory relative to the current directory on the local machine.
sb_config.save('TE_MODELS_CACHE_DIR', 'models_cache')

## Upload and activate the Script-Language-Container (SLC)

The Transformer Extension requires the installation of a special version of the Script Language Container - a platform for running UDF code. The <a href="https://github.com/exasol/transformers-extension/blob/main/doc/user_guide/user_guide.md#the-pre-built-language-container" target="_blank" rel="noopener">User Guide</a> provides a detailed description of how this can be achieved. Here we will do what is described as Quick Installation - we will use the command-line interface.

In [None]:
deploy_command = f"""
python -m exasol_transformers_extension.deploy language-container \
    --dsn {EXTERNAL_HOST} \
    --db-user {sb_config.USER} \
    --db-pass {sb_config.PASSWORD} \
    --bucketfs-name {sb_config.BUCKETFS_SERVICE} \
    --bucketfs-host {sb_config.EXTERNAL_HOST_NAME} \
    --bucketfs-port {sb_config.BUCKETFS_PORT} \
    --bucketfs-user {sb_config.BUCKETFS_USER} \
    --bucketfs-password {sb_config.BUCKETFS_PASSWORD} \
    --bucketfs-use-https {sb_config.BUCKETFS_USE_HTTPS} \
    --bucket {sb_config.BUCKETFS_BUCKET} \
    --path-in-bucket . \
    --language-alias {sb_config.SCRIPT_LANGUAGE_NAME} \
    --version 0.5.0
"""

!{deploy_command}

## Create objects in the database.
### Scripts
Once the SLC is installed we can upload all the required scripts into the database. 

In [None]:
deploy_command = f"""
python -m exasol_transformers_extension.deploy scripts \
    --dsn {EXTERNAL_HOST} \
    --db-user {sb_config.USER} \
    --db-pass {sb_config.PASSWORD} \
    --schema {sb_config.SCHEMA} \
    --language-alias {sb_config.SCRIPT_LANGUAGE_NAME} \
    --no-use-ssl-cert-validation
"""

!{deploy_command}

### BucketFS connection
Let's create a connection to the BucketFS where we are going to store all our models. <a href="https://docs.exasol.com/db/latest/database_concepts/bucketfs/bucketfs.htm" target="_blank" rel="noopener">BucketFS</a> is a replicated file system available in the Exasol cluster. We will use this connection hereafter in the queries.

Notice that we specify a sub-directory of the bucket root, e.g. "my_storage" (the name can be chosen arbitrarily). The BucketFS will create this sub-directory for us the first time we use the connection.

In [None]:
import pyexasol

bfs_host = f"{sb_config.EXTERNAL_HOST_NAME}:{sb_config.BUCKETFS_PORT}"
bfs_url_prefix = 'https://' if sb_config.BUCKETFS_USE_HTTPS.lower() == 'true' else 'http://'
bfs_dest = f"{bfs_url_prefix}{bfs_host}/{sb_config.BUCKETFS_BUCKET}/{sb_config.TE_BFS_DIR};{sb_config.BUCKETFS_SERVICE}"

sql = f"""
CREATE OR REPLACE CONNECTION [{sb_config.TE_BFS_CONN}]
    TO '{bfs_dest}'
    USER {{BUCKETFS_USER!s}}
    IDENTIFIED BY {{BUCKETFS_PASSWORD!s}}
"""
query_params = {
    "BUCKETFS_USER": sb_config.BUCKETFS_USER, 
    "BUCKETFS_PASSWORD": sb_config.BUCKETFS_PASSWORD
}
with pyexasol.connect(dsn=EXTERNAL_HOST, user=sb_config.USER, password=sb_config.PASSWORD, compression=True) as conn:
    conn.execute(query=sql, query_params=query_params)

### Huggingface token connection

If we need to use a Huggingface token let's put it into an Exasol connection object.

In [None]:
import pyexasol

if sb_config.TE_TOKEN_CONN:
    sql = f"""
    CREATE OR REPLACE CONNECTION [{sb_config.TE_TOKEN_CONN}]
        TO ''
        IDENTIFIED BY {{TOKEN!s}}
    """
    query_params = {"TOKEN": sb_config.TE_TOKEN}
    with pyexasol.connect(dsn=EXTERNAL_HOST, user=sb_config.USER, password=sb_config.PASSWORD, compression=True) as conn:
        conn.execute(query=sql, query_params=query_params)

## Dependencies

Some models require the [Sacremoses tokenizer](https://github.com/alvations/sacremoses) to be installed in the local environment when they get downloaded. Let's make sure we have it installed by running the command below.

In [None]:
!pip install sacremoses