# SageMaker Extension initialization

Here we will perform all the necessary steps to get the SageMaker Extension functionality up and running. Please refer to the SageMaker Extension <a href="https://github.com/exasol/sagemaker-extension/blob/main/doc/user_guide/user_guide.md" target="_blank" rel="noopener">User Guide</a> for details on the required initialization steps. The extension module should have already been installed during the installation of this product, therefore the first step mentioned in the guide can be skipped.

We will be running SQL queries using <a href="https://jupysql.ploomber.io/en/latest/quick-start.html" target="_blank" rel="noopener"> JupySQL</a> SQL Magic and <a href="https://github.com/exasol/pyexasol" target="_blank" rel="noopener">`pyexasol`</a> module.

## Prerequisites

Prior to using this notebook one needs to complete the follow steps:
1. [Configure the sandbox](../sendbox_config.ipynb).

## Set up

In [None]:
from collections import UserDict

class Secrets(UserDict):
    """This class mimics the Secret Store we will start using soon."""

    def save(self, key: str, value: str) -> "Secrets":
        self[key] = value
        return self

def get_value_as_attribute(self, key):
    val = self.get(key)
    if val is None:
        raise AttributeError(f'{key} value is not defined')
    return val

Secrets.__getattr__ = get_value_as_attribute

# For now just hardcode the configuration.
sb_config = Secrets({    
    'EXTERNAL_HOST_NAME': '192.168.124.93',
    'HOST_PORT': '8888',
    'USER': 'sys',
    'PASSWORD': 'exasol',
    'BUCKETFS_PORT': '6666',
    'BUCKETFS_USER': 'w',
    'BUCKETFS_PASSWORD': 'write',
    'BUCKETFS_USE_HTTPS': 'False',
    'BUCKETFS_SERVICE': 'bfsdefault',
    'BUCKETFS_BUCKET': 'default',
    'SCRIPT_LANGUAGE_NAME': 'PYTHON3_SME',
    'UDF_FLAVOR': 'python3-ds-EXASOL-6.0.0',
    'UDF_RELEASE': '20190116',
    'UDF_CLIENT': 'exaudfclient_py3',
    'SCHEMA': 'IDA'
})

EXTERNAL_HOST = f"{sb_config.EXTERNAL_HOST_NAME}:{sb_config.HOST_PORT}"
SCRIPT_LANGUAGES = f"{sb_config.SCRIPT_LANGUAGE_NAME}=localzmq+protobuf:///{sb_config.BUCKETFS_SERVICE}/" \
    f"{sb_config.BUCKETFS_BUCKET}/{sb_config.UDF_FLAVOR}?lang=python#buckets/{sb_config.BUCKETFS_SERVICE}/" \
    f"{sb_config.BUCKETFS_BUCKET}/{sb_config.UDF_FLAVOR}/exaudf/{sb_config.UDF_CLIENT}";
WEBSOCKET_URL = f"exa+websocket://{sb_config.USER}:{sb_config.PASSWORD}" \
    f"@{EXTERNAL_HOST}/{sb_config.SCHEMA}?SSLCertificate=SSL_VERIFY_NONE"

We will add some new variables specific to the SageMaker Extension.

In [None]:
%%capture

# AWS access credentials
sb_config.save('AWS_KEY_ID', 'AKIASNN2LAKN3EYP2Y45')
sb_config.save('AWS_ACCESS_KEY', 'ezgUx1qb1jaPZFyL4DyNXfdnd67a1r31zuZBRkvA')
sb_config.save('AWS_REGION', 'eu-central-1')
sb_config.save('AWS_ROLE', 'arn:aws:iam::166283903643:role/sagemaker-role')

# S3 bucket, which must exist
sb_config.save('AWS_BUCKET', 'ida-dataset-bucket')
    
# Name of the AWS connection to be created in the database
sb_config.save('AWS_CONN', 'MyAWSConn')

Let's bring up JupySQL and connect to the database via SQLAlchemy. Please refer to the documentation in the sqlalchemy-exasol for details on how to connect to the database using Exasol SQLAlchemy driver.

In [None]:
from sqlalchemy import create_engine

engine = create_engine(WEBSOCKET_URL)

%load_ext sql
%sql engine

## Upload and activate the Script-Language-Container (SLC)

We will start with loading the Script Language Container (SLC) specially built for the SageMaker Extension. The latest release of both the Extension and its SLC can be found <a href="https://github.com/exasol/sagemaker-extension/releases" target="_blank" rel="noopener">here</a>. We will use an http(s) client for that.

In [None]:
import tempfile
from stopwatch import Stopwatch

# Get a temporary file name for the SLC.
_, tmp_file = tempfile.mkstemp(suffix='.tar.gz')

# Download SLC.
stopwatch = Stopwatch()
download_command = f'curl -L -o {tmp_file} https://github.com/exasol/sagemaker-extension/releases/download/0.5.0/' \
    f'exasol_sagemaker_extension_container-release-CYEVORMGO3X5JZJZTXFLS23FZYKIKDG7MVNUSSJK6FUST5WRPZUQ.tar.gz'
! {download_command}
print(f"Downloading the SLC took: {stopwatch}")

# Upload SLC into the BucketFS
stopwatch = Stopwatch()
bfs_url_prefix = "https://" if sb_config.BUCKETFS_USE_HTTPS.lower() == 'true' else "http://"
bfs_host = f'{sb_config.EXTERNAL_HOST_NAME}:{sb_config.BUCKETFS_PORT}'
upload_command = f'curl {bfs_url_prefix}{sb_config.BUCKETFS_USER}:{sb_config.BUCKETFS_PASSWORD}' \
    f'@{bfs_host}/{sb_config.BUCKETFS_BUCKET}/{sb_config.UDF_FLAVOR}.tar.gz --upload-file {tmp_file}'
! {upload_command}
print(f"Uploading the SLC took: {stopwatch}")

# Delete SLC file on the local drive.
! rm {tmp_file}

We need to activate the uploaded SLC by updating the system parameter `SCRIPT_LANGUAGES`.

In [None]:
%%sql
ALTER SYSTEM SET SCRIPT_LANGUAGES='{{SCRIPT_LANGUAGES}}';
ALTER SESSION SET SCRIPT_LANGUAGES='{{SCRIPT_LANGUAGES}}';

## Create objects in the database.
### Scripts
Once the SLC is installed we can upload all the required scripts into the database. 

In [None]:
deploy_command = f"""
python -m exasol_sagemaker_extension.deployment.deploy_cli \
    --host {sb_config.EXTERNAL_HOST_NAME} \
    --port {sb_config.HOST_PORT} \
    --user {sb_config.USER} \
    --pass {sb_config.PASSWORD} \
    --schema {sb_config.SCHEMA}
"""

print(deploy_command)
!{deploy_command}

Let's verify that the scripts have been created. We should see 4 new UDF scripts and 4 new Lua scripts.

In [None]:
%%sql
SELECT SCRIPT_NAME, SCRIPT_TYPE FROM SYS.EXA_ALL_SCRIPTS WHERE SCRIPT_SCHEMA='{{sb_config.SCHEMA}}'

### AWS connection

The SageMaker Extension needs to connect to AWS SageMaker and our AWS S3 bucket. For that, it needs AWS credentials with Sagemaker Execution permissions. The required credentials are AWS Access Key (Please check how to <a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey" target="_blank" rel="noopener">create an access key</a>).

In order for the SageMaker-Extension to use the Access Key we need to create an Exasol CONNECTION object which securely stores the keys. For more information, please check Exasol documentation on how to <a href="https://docs.exasol.com/db/latest/sql/create_connection.htm?Highlight=connection" target="_blank" rel="noopener">create a connection</a>.

In [None]:
import pyexasol

sql = f"""
CREATE OR REPLACE  CONNECTION [{sb_config.AWS_CONN}]
    TO 'https://{sb_config.AWS_BUCKET}.s3.{sb_config.AWS_REGION}.amazonaws.com/'
    USER {{AWS_KEY_ID!s}}
    IDENTIFIED BY {{AWS_ACCESS_KEY!s}}
"""
query_params = {
    "AWS_KEY_ID": sb_config.AWS_KEY_ID, 
    "AWS_ACCESS_KEY": sb_config.AWS_ACCESS_KEY
}
with pyexasol.connect(dsn=EXTERNAL_HOST, user=sb_config.USER, password=sb_config.PASSWORD, compression=True) as conn:
    conn.execute(query=sql, query_params=query_params)

Now we are ready to start training a model. We will do this in the [following](sme_train_model.ipynb) notebook.