# Using the Script-Language Container

A [Script-Language Container](https://github.com/exasol/script-languages-release) for the Exasol database consists of a Linux container with a complete Linux distribution and all required libraries, such as a script client. The script client is responsible for the communication with the database and for executing the script code.

## Prerequisites

To run this Notebook you need:
- Jupyter with Python3.6+ Kernel
- Docker 17.05+
- Your Notebook user needs permissions to run Docker

## Preparing the Notebook

First we need to install and import a few Python packages required in the course of this Notebook.

In [73]:
!pip install -r requirements.txt



In [74]:
import bash_runner as bash # A helper to run bash with interactive output from python
import importlib
from pathlib import Path
import pyexasol
import requests
import textwrap

## Cloning the Git Repository

To use the [Script-Language Container](https://github.com/exasol/script-languages-release) we need to clone the Git Repository with

```
git clone https://github.com/exasol/script-languages-release --recursive
```

We need to use `--recursive` to also clone the sub-modules of the repository.

**Note:**You can use the following code snippet, that either clones the repository if it isn't already cloned or resets it to the current origin/master branch, such that we have always a defined state for the remaining Notebook.

In [75]:
slc_path="script-languages-release"
if not Path(slc_path).exists():
    bash.run("""
    git clone https://github.com/exasol/script-languages-release --recursive
    """)
else:
    bash.run(f"""
    cd {slc_path}
    git fetch
    git reset --hard origin/master 
    git submodule foreach git reset --hard origin/master 
    """)

HEAD is now at d08ab04 Merge pull request #304 from exasol/develop
Entering 'script-languages'
HEAD is now at d2c4f55 #293: Removed python-distutils-extra package from python-3.6-minimal-EXASOL-6.2.0 flavor (#201)


## Buiding and Exporting a Container

To build and export the container you can use `exaslct`. It first builds a series of Docker images and then exports the container as a `tar.gz` package. We provide several flavors of containers with different capabilities. You can find out more about the flavors in our [flavor documentation on Github.](https://github.com/exasol/script-languages-release/blob/master/flavors/README.md). Flavors are described by a flavor definition in the directory `flavors/`. Here is an overview of the available flavors:

In [76]:
bash.run(f"""
find {slc_path}/flavors/  -maxdepth 1 -name '*EXASOL*'
""")

script-languages-release/flavors/python-3.6-data-science-EXASOL-6.2.0
script-languages-release/flavors/python-3.6-minimal-EXASOL-6.2.0
script-languages-release/flavors/r-4-minimal-EXASOL-6.2.0
script-languages-release/flavors/standard-EXASOL-7.0.0
script-languages-release/flavors/standard-EXASOL-7.1.0-without-python2.7
script-languages-release/flavors/standard-EXASOL-7.1.0
script-languages-release/flavors/r-3.5-data-science-EXASOL-6.2.0
script-languages-release/flavors/standard-EXASOL-6.2.0
script-languages-release/flavors/python-3.6-data-science-cuda-EXASOL-6.2.0


For this example, we use the `flavors/python-3.6-minimal-EXASOL-6.2.0` flavor and export it to the `containers` directory.

A container gets built via a series of Docker images, and then it's exported into a `tar` file.

In [77]:
bash.run(f"""
pushd {slc_path}
./exaslct export --flavor-path flavors/python-3.6-minimal-EXASOL-6.2.0 --export-path containers
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages
INFO: Informed scheduler that task   ExportContainers_9fd9e5c7ba   has status   PENDING
INFO: Informed scheduler that task   ExportFlavorContainer_2badddf4c6   has status   PENDING
INFO: Informed scheduler that task   AnalyzeRelease_038e4aaba8   has status   PENDING
INFO: Informed scheduler that task   AnalyzeLanguageDeps_038e4aaba8   has status   PENDING
INFO: Informed scheduler that task   AnalyzeUDFClientDeps_038e4aaba8   has status   PENDING
INFO: Informed scheduler that task   AnalyzeBuildRun_038e4aaba8   has status   PENDING
INFO: Informed scheduler that task   AnalyzeBuildDeps_038e4aaba8   has status   PENDING
INFO: Informed scheduler that task   AnalyzeFlavorCustomization_038e4aaba8   has status   PENDING
INFO: Informed scheduler that task   AnalyzeFlavorBaseDeps_038e4aaba8   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker 

### What to do if something doesn't work?

During the build it can happen that external package repositories might not be available or something is wrong on your machine where you run the build. For these cases, `exaslct` stores many logs to identify the problem.

#### Exaslsct Log

The main log for `exaslct` is stored directly as `exaslct.log` in the build output of the job. With the following command you can find the main logs for all previous executions.

In [78]:
bash.run(f"""
pushd {slc_path}
find .build_output -name 'exaslct.log'
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages
.build_output/jobs/2021_07_28_09_13_27_ExportContainers/outputs/ExportContainers_f872487cc6/logs/exaslct.log
.build_output/jobs/2021_07_28_09_57_13_ExportContainers/outputs/ExportContainers_4422ee621f/logs/exaslct.log
.build_output/jobs/2021_07_28_12_19_21_ExportContainers/outputs/ExportContainers_9fd9e5c7ba/logs/exaslct.log
.build_output/jobs/2021_07_28_09_21_01_ExportContainers/outputs/ExportContainers_a2d5a59019/logs/exaslct.log
.build_output/jobs/2021_07_28_09_57_25_ExportContainers/outputs/ExportContainers_57b192cb88/logs/exaslct.log
.build_output/jobs/2021_07_28_10_52_07_UploadContainers/outputs/UploadContainers_25e40d2aa0/logs/exaslct.log
.build_output/jobs/2021_07_28_09_04_49_ExportContainers/outputs/ExportContainers_113f1a1225/logs/exaslct.log
.build_output/jobs/2021_07_28_09_49_33_UploadContainers/outputs/UploadContainers_8819a4267a/logs/exasl

With the following command you can show the log file from the last execution.

In [79]:
bash.run(f"""
pushd {slc_path}
LAST_LOG="$(find .build_output -name 'exaslct.log' | sort |tail -n 1)"
cat $LAST_LOG | tail -n 20
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages
    - 1 AnalyzeBuildDeps_038e4aaba8(flavor_path=flavors/python-3.6-minimal-EXASOL-6.2.0)
    - 1 AnalyzeBuildRun_038e4aaba8(flavor_path=flavors/python-3.6-minimal-EXASOL-6.2.0)
    - 1 AnalyzeFlavorBaseDeps_038e4aaba8(flavor_path=flavors/python-3.6-minimal-EXASOL-6.2.0)
    - 1 AnalyzeFlavorCustomization_038e4aaba8(flavor_path=flavors/python-3.6-minimal-EXASOL-6.2.0)
    - 1 AnalyzeLanguageDeps_038e4aaba8(flavor_path=flavors/python-3.6-minimal-EXASOL-6.2.0)
    ...

This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====

The command took 137.629877 s

Cached container under /home/jupyter/data-science-examples_slc/tutorials/script-languages/script-languages-release/.build_output/cache/exports/python-3.6-minimal-EXASOL-6.2.0-release-HT4GEK67BZJJ5KPNBYOVGYRVAGARJOFY43ALTG4UQTUNWSR5U2IQ.tar.gz


#### Build Output Directory

More detailed information about the build or other operations can be found in the `.build_output/jobs/*/outputs` directory. Here each run of `exaslct` creates its own directory under `.build_output/jobs`. The `outputs` directory stores the outputs and log files (if any) that each executed task of `exaslct` produces. Especially, the Docker tasks such as build, pull and push store the logs returned by the Docker API. This can be helpful for finding problems during the build.

In [80]:
bash.run(f"""
pushd {slc_path}
find .build_output/jobs/*/outputs -type f -name '*log' | tail
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages
.build_output/jobs/2021_07_28_10_43_29_ExportContainers/outputs/ExportContainers_8e2fbaffd7/logs/exaslct.log
.build_output/jobs/2021_07_28_10_48_56_SpawnTestEnvironmentWithDockerDB/outputs/SpawnTestEnvironmentWithDockerDB_c53d105b13/WaitForTestDockerDatabase_1f56bd40a5/logs/startup.log
.build_output/jobs/2021_07_28_10_48_56_SpawnTestEnvironmentWithDockerDB/outputs/SpawnTestEnvironmentWithDockerDB_c53d105b13/PopulateEngineSmallTestDataToDatabase_e4fc873e65/logs/log
.build_output/jobs/2021_07_28_10_48_56_SpawnTestEnvironmentWithDockerDB/outputs/SpawnTestEnvironmentWithDockerDB_c53d105b13/UploadExaJDBC_e610c265a2/logs/log
.build_output/jobs/2021_07_28_10_48_56_SpawnTestEnvironmentWithDockerDB/outputs/SpawnTestEnvironmentWithDockerDB_c53d105b13/UploadVirtualSchemaJDBCAdapter_e610c265a2/logs/log
.build_output/jobs/2021_07_28_10_52_07_UploadContainers/outputs

## Customizing Script-Language Containers

Sometimes you need very specific dependencies or versions of dependencies in the Exasol UDFs. In such case you can customize a Script-Language Container.

### Flavor Definition

To customize a flavor, you need to change the flavor definition. A flavor consists of several build steps. The following images gives you an idea about how these build steps are connected with each other.

![Flavor Structure](slc_main_build_steps.svg)

For customizing a flavor usually the `flavor_customization` build step is most important. It contains everything you need to add dependencies. The remaining build steps should be only changed with care, but sometimes some dependencies are defined in other build steps because the script client depends on them. Here you see the directory structure of the flavor definition for our example flavor `flavors/python-3.6-minimal-EXASOL-6.2.0`.

In [81]:
bash.run(f""" 
pushd {slc_path}
find -L flavors/python-3.6-minimal-EXASOL-6.2.0 -maxdepth 2
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages
flavors/python-3.6-minimal-EXASOL-6.2.0
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_customization
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_customization/Dockerfile
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_customization/packages
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_base
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_base/base_test_build_run
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_base/release
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_base/testconfig
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_base/flavor_test_build_run
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_base/base_test_deps
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_base/language_definition
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_base/build_run
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_base/flavor_base_deps
flavors/python-3.6-minimal-EXAS

### Flavor Customization Build Step

The `flavor_customization` build step consists of a Dockerfile and several package lists which can be modified. We recommend to use the package lists to add new packages to the flavor and only modify the Dockerfile if you need very specific changes, like adding additional resources.

In [82]:
bash.run(f""" 
pushd {slc_path}
find -L flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_customization -type f
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_customization/Dockerfile
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_customization/packages/python3_pip_packages
flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_customization/packages/apt_get_packages


#### Dockerfile

The Dockerfile consists of two parts. The first part installs the packages from the package list and should only be change with care. The second part is free for your changes. Read the description in the Dockerfile carefully to find out what you can and shouldn't do.

In [83]:
bash.run(f""" 
pushd {slc_path}
cat flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_customization/Dockerfile
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages
############################################################################################
############################################################################################
# This Dockerfile allows you to extend this flavor by installing packages or adding files. 
# IF you didn't change the lines below, you can add packages and their version to the  
# files in ./packages and they get automatically installed.                                
############################################################################################
############################################################################################

#######################################################################
#######################################################################
# Do not change the following lines unless you know what you are doing 
####

#### Package Lists

The package lists have a unified format. Each line consists of the package name and the package version separated by "|", e.g `xgboost|1.3.3`. You can comment out a whole line by adding"#" at the beginning. You can also add a trailing comment to a package definition by adding a "#" after the package definition. We usually recommend to pin the version, such that there are no surprises for which version gets installed.

In [84]:
bash.run(f""" 
pushd {slc_path}
cat flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_customization/packages/python3_pip_packages
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages
# This file specifies the package list which gets installed via pip for python3.
# You must specify the the package and its version separated by a |.
# We recommend here the usage of package versions, to ensure that the container 
# builds are reproducible. However, we allow also packages without version.
# As you can see, this file can contain comments which start with #.
# If a line starts with # the whole line is a comment, however you can
# also start a comment after the package definition.

#tensorflow-probability|0.9.0


We are now going to append the "xgboost" Python package to one of the package lists by adding `"xgboost|1.3.3"` and `scikit-learn|0.24.2` to the `flavor_customization/packages/python3_pip_packages` file.

In [85]:
bash.run(f""" 
pushd {slc_path}
echo "xgboost|1.3.3" > flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_customization/packages/python3_pip_packages
echo "scikit-learn|0.24.2" >> flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_customization/packages/python3_pip_packages
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages


As you can see below, we now added a new line to our package list.

In [86]:
bash.run(f""" 
pushd {slc_path}
cat flavors/python-3.6-minimal-EXASOL-6.2.0/flavor_customization/packages/python3_pip_packages
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages
xgboost|1.3.3
scikit-learn|0.24.2


#### Rebuilding the customized Flavor

After changing the flavor you need to rebuild it. You can do it by running `./exaslsct export --flavor-path <flavor-path>` again. Exaslct automatically recognizes that the flavor has changed and builds a new version of the container.

In [87]:
bash.run(f"""
pushd {slc_path}
./exaslct export --flavor-path flavors/python-3.6-minimal-EXASOL-6.2.0 --export-path containers --force-rebuild --force-rebuild-from flavor_customization
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages
INFO: Informed scheduler that task   ExportContainers_fe899a658a   has status   PENDING
INFO: Informed scheduler that task   ExportFlavorContainer_14a40b0b65   has status   PENDING
INFO: Informed scheduler that task   AnalyzeRelease_de1ee08592   has status   PENDING
INFO: Informed scheduler that task   AnalyzeLanguageDeps_de1ee08592   has status   PENDING
INFO: Informed scheduler that task   AnalyzeUDFClientDeps_de1ee08592   has status   PENDING
INFO: Informed scheduler that task   AnalyzeBuildRun_de1ee08592   has status   PENDING
INFO: Informed scheduler that task   AnalyzeBuildDeps_de1ee08592   has status   PENDING
INFO: Informed scheduler that task   AnalyzeFlavorCustomization_de1ee08592   has status   PENDING
INFO: Informed scheduler that task   AnalyzeFlavorBaseDeps_de1ee08592   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker 

**Node:** Your old container doesn't get lost, because when you change a flavor your container gets a new hash code. If you revert your changes the system automatically uses the existing cached container. Below you can see the content of the cache directory for the containers.

In [88]:
bash.run(f"""
pushd {slc_path}
ls -sh .build_output/cache/exports
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages
total 3.8G
613M python-3.6-data-science-EXASOL-6.2.0-release-3JRV4LALOHLBV7SBYCIIJBTTUURJECI7I7NJNQ3MTOPXJL6SMPNA.tar.gz
4.0K python-3.6-data-science-EXASOL-6.2.0-release-3JRV4LALOHLBV7SBYCIIJBTTUURJECI7I7NJNQ3MTOPXJL6SMPNA.tar.gz.sha512sum
465M python-3.6-data-science-EXASOL-6.2.0-release-J4HDSRQRNTHQCMJOLAMKO66VVSALKODMS5FSQEYJUB6MLIBSI2JA.tar.gz
4.0K python-3.6-data-science-EXASOL-6.2.0-release-J4HDSRQRNTHQCMJOLAMKO66VVSALKODMS5FSQEYJUB6MLIBSI2JA.tar.gz.sha512sum
452M python-3.6-data-science-EXASOL-6.2.0-release-LTAV5RUNCBWDSNSCEZWZHU6WJPRV2UGBADMX3GLKUIUOQG6JMA6Q.tar.gz
4.0K python-3.6-data-science-EXASOL-6.2.0-release-LTAV5RUNCBWDSNSCEZWZHU6WJPRV2UGBADMX3GLKUIUOQG6JMA6Q.tar.gz.sha512sum
452M python-3.6-data-science-EXASOL-6.2.0-release-V2RLM6N2SE7VXL2ZCYBNIDDFISFPPUEERJNWONWYQE6ULHLBNOYA.tar.gz
4.0K python-3.6-data-science-EXASOL-6.2.0-release-V2RL

## Testing the new Script-Language Container

Now, that we have an updated container, we need to check if our changes were successful. For that we are going to upload the container to an Exasol Database and have a look into it. In this example, we are going to use a local Docker-DB started by `exaslct`, which uses our [integration-test-docker-environment](https://github.com/exasol/integration-test-docker-environment) in the background. 

**Note:** You could also use your own Exasol Database by changing the variables below. However, this Notebook must be able to access the BucketFS of your Exasol Database or you need to manually upload the container. 

In [89]:
DATABASE_HOST="localhost"
DATABASE_PORT=8888
DATABASE_USER="sys"
DATABASE_PASSWORD="exasol"
BUCKETFS_PORT=6666
BUCKETFS_USER="w"
BUCKETFS_PASSWORD="write"
BUCKETFS_NAME="bfsdefault"
BUCKET_NAME="default"
PATH_IN_BUCKET="container"

### Starting a local Docker-DB for Testing

#### Start the environment and forward the database and bucketfs ports to the specified host ports. 

**Note:** The Exasol Integration-Test-Docker-Environment requires Docker with privileged mode

**Note:** Starting the environment can take between 3-5 min.

In [90]:
bash.run(f"""
pushd {slc_path}
./exaslct spawn-test-environment --environment-name test --database-port-forward {DATABASE_PORT} --bucketfs-port-forward {BUCKETFS_PORT}
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages
INFO: Informed scheduler that task   SpawnTestEnvironmentWithDockerDB_4ee3400772   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 5 processes
INFO: [pid 15] Worker Worker(salt=822738088, workers=5, host=test-statsmodels, username=root, pid=10) running   SpawnTestEnvironmentWithDockerDB_4ee3400772(db_user=sys, environment_name=test, docker_db_image_name=exasol/docker-db, docker_db_image_version=7.0.10)
INFO: [pid 15] Worker Worker(salt=822738088, workers=5, host=test-statsmodels, username=root, pid=10) new requirements      SpawnTestEnvironmentWithDockerDB_4ee3400772(db_user=sys, environment_name=test, docker_db_image_name=exasol/docker-db, docker_db_image_version=7.0.10)
INFO: Informed scheduler that task   PrepareDockerNetworkForTestEnvironment_3a40fb4e74   has status   PENDING
INFO: Informed scheduler that task   SpawnTestE

### Upload the Container to the Database

To use our container we need to upload it to the BucketFS . If the build machine has access to the BucketFS we can do it with the `exaslct upload` command, otherwise you need to export the container and transfer it to a machine that has access to the BucketFS and upload it via `curl`, as described in our [documentation](https://docs.exasol.com/database_concepts/udf_scripts/adding_new_packages_script_languages.htm).

In [91]:
bash.run(f"""
pushd {slc_path}
./exaslct upload \
    --flavor-path flavors/python-3.6-minimal-EXASOL-6.2.0 \
    --database-host {DATABASE_HOST}\
    --bucketfs-port {BUCKETFS_PORT} \
    --bucketfs-username {BUCKETFS_USER} \
    --bucketfs-password {BUCKETFS_PASSWORD} \
    --bucketfs-name {BUCKETFS_NAME} \
    --bucket-name {BUCKET_NAME} \
    --path-in-bucket {PATH_IN_BUCKET} \
    --release-name current
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages
INFO: Informed scheduler that task   UploadContainers_8d425bb850   has status   PENDING
INFO: Informed scheduler that task   UploadFlavorContainers_b72df9c2b0   has status   PENDING
INFO: Informed scheduler that task   AnalyzeRelease_728d2b4891   has status   PENDING
INFO: Informed scheduler that task   AnalyzeLanguageDeps_728d2b4891   has status   PENDING
INFO: Informed scheduler that task   AnalyzeUDFClientDeps_728d2b4891   has status   PENDING
INFO: Informed scheduler that task   AnalyzeBuildRun_728d2b4891   has status   PENDING
INFO: Informed scheduler that task   AnalyzeBuildDeps_728d2b4891   has status   PENDING
INFO: Informed scheduler that task   AnalyzeFlavorCustomization_728d2b4891   has status   PENDING
INFO: Informed scheduler that task   AnalyzeFlavorBaseDeps_728d2b4891   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker

### Getting the language container activation statement without upload

Sometimes you can't use the `upload` command to upload your container to the BucketFS. To get the language activation statement regardless of that, you can use the `generate-language-activation` command.

In [92]:
bash.run(f"""
pushd {slc_path}
./exaslct generate-language-activation \
    --flavor-path flavors/python-3.6-minimal-EXASOL-6.2.0 \
    --bucketfs-name {BUCKETFS_NAME} \
    --bucket-name {BUCKET_NAME} \
    --path-in-bucket {PATH_IN_BUCKET} \
    --container-name python-3.6-minimal-EXASOL-6.2.0-current  2>&1 | tail -n 15
""")

~/data-science-examples_slc/tutorials/script-languages/script-languages-release ~/data-science-examples_slc/tutorials/script-languages


In SQL, you can activate the languages supported by the python-3.6-minimal-EXASOL-6.2.0
flavor by using the following statements:


To activate the flavor only for the current session:

ALTER SESSION SET SCRIPT_LANGUAGES='PYTHON3=localzmq+protobuf:///bfsdefault/default/container/python-3.6-minimal-EXASOL-6.2.0-current?lang=python#buckets/bfsdefault/default/container/python-3.6-minimal-EXASOL-6.2.0-current/exaudf/exaudfclient_py3';


To activate the flavor on the system:

ALTER SYSTEM SET SCRIPT_LANGUAGES='PYTHON3=localzmq+protobuf:///bfsdefault/default/container/python-3.6-minimal-EXASOL-6.2.0-current?lang=python#buckets/bfsdefault/default/container/python-3.6-minimal-EXASOL-6.2.0-current/exaudf/exaudfclient_py3';



### Connecting to the database and activate the container

Once we have a connection to the database we run the `ALTER SESSION` statement or `ALTER SYSTEM` statement (if you want to activate the container permanently and globally) we got from the upload.

In [93]:
def connect():
    con=pyexasol.connect(dsn=f"{DATABASE_HOST}:{DATABASE_PORT}",user=DATABASE_USER,password=DATABASE_PASSWORD)
    con.execute("ALTER SESSION SET SCRIPT_LANGUAGES='PYTHON3=builtin_python3 PYTHON3_DS=localzmq+protobuf:///bfsdefault/default/container/python-3.6-minimal-EXASOL-6.2.0-release-current?lang=python#buckets/bfsdefault/default/container/python-3.6-minimal-EXASOL-6.2.0-release-current/exaudf/exaudfclient_py3';")
    con.execute("OPEN SCHEMA TEST")
    return con

### Check if your customization did work

We first create a helper UDF which allows us to run arbitrary shell commands inside of a UDF instance. With that we can easily inspect the container.

In [94]:
con = connect()

con.execute(textwrap.dedent("""
CREATE OR REPLACE PYTHON3_DS SCALAR SCRIPT execute_shell_command_py3(command VARCHAR(2000000), split_output boolean)
EMITS (lines VARCHAR(2000000)) AS
import subprocess

def run(ctx):
    try:
        p = subprocess.Popen(ctx.command,
                             stdout    = subprocess.PIPE,
                             stderr    = subprocess.STDOUT,
                             close_fds = True,
                             shell     = True)
        out, err = p.communicate()
        if isinstance(out,bytes):
            out=out.decode('utf8')
        if ctx.split_output:
            for line in out.strip().split('\\n'):
                ctx.emit(line)
        else:
            ctx.emit(out)
    finally:
        if p is not None:
            try: p.kill()
            except: pass
/
"""))

<ExaStatement session_id=1706531947956600832 stmt_idx=3>

#### Check with "pip list" if the  "xgboost" package is installed

We use our helper UDF to run `python3 -m pip list` directly in the container and get the list of currently avaiable python3 packages.

In [95]:
con = connect()
rs=con.execute("""select execute_shell_command_py3('python3 -m pip list', true)""")
for r in rs: 
    print(r[0])

Package         Version
--------------- ---------------
joblib          1.0.1
numpy           1.19.5
pandas          1.1.5
pip             20.3.4
pygobject       3.26.1
python-apt      1.6.5+ubuntu0.6
python-dateutil 2.8.2
pytz            2021.1
scikit-learn    0.24.2
scipy           1.5.4
setuptools      57.4.0
six             1.16.0
threadpoolctl   2.2.0
wheel           0.36.2
xgboost         1.3.3


By running `pip list` directly in the container, we see what is currently available in the container. However, sometimes this might not be what we expected. For these cases, `exaslct` stores information about the flavor the container was build from within the container.

#### Embedded Build Information of the Container

Here we see an overview about the build information which `exaslct` embedded into the container. `Exaslct` stores all packages lists (as defined in the flavor and what actually got installed), the final Dockerfiles and the image info. The image info describes how the underlying Docker images of the container got built. The build information is stored in the `/build_info` directory in the container. We can use again our helper UDF to inspect the build information.

In [96]:
con = connect()
rs=con.execute("""select execute_shell_command_py3('find /build_info', true)""")
for r in rs: 
    print(r[0])

/build_info
/build_info/image_info
/build_info/image_info/python-3.6-minimal-EXASOL-6.2.0-build_run
/build_info/image_info/python-3.6-minimal-EXASOL-6.2.0-release
/build_info/image_info/python-3.6-minimal-EXASOL-6.2.0-udfclient_deps
/build_info/image_info/python-3.6-minimal-EXASOL-6.2.0-language_deps
/build_info/image_info/python-3.6-minimal-EXASOL-6.2.0-flavor_base_deps
/build_info/image_info/python-3.6-minimal-EXASOL-6.2.0-build_deps
/build_info/image_info/python-3.6-minimal-EXASOL-6.2.0-flavor_customization
/build_info/dockerfiles
/build_info/dockerfiles/python-3.6-minimal-EXASOL-6.2.0-build_run
/build_info/dockerfiles/python-3.6-minimal-EXASOL-6.2.0-release
/build_info/dockerfiles/python-3.6-minimal-EXASOL-6.2.0-udfclient_deps
/build_info/dockerfiles/python-3.6-minimal-EXASOL-6.2.0-language_deps
/build_info/dockerfiles/python-3.6-minimal-EXASOL-6.2.0-flavor_base_deps
/build_info/dockerfiles/python-3.6-minimal-EXASOL-6.2.0-build_deps
/build_info/dockerfiles/python-3.6-minimal-EXASOL

The following command shows for example, which python3 package pip found directly after the build of the container image.

In [97]:
con = connect()
rs=con.execute("""select execute_shell_command_py3('cat /build_info/actual_installed_packages/release/python3_pip_packages', true)""")
for r in rs: 
    print(r[0])

joblib|1.0.1
numpy|1.19.5
pandas|1.1.5
pip|20.3.4
pygobject|3.26.1
python-apt|1.6.5+ubuntu0.6
python-dateutil|2.8.2
pytz|2021.1
scikit-learn|0.24.2
scipy|1.5.4
setuptools|57.4.0
six|1.16.0
threadpoolctl|2.2.0
wheel|0.36.2
xgboost|1.3.3


You could for example compare this to the package list of the `flavor-customization` build step and check if all your requested packages got installed.

In [98]:
con = connect()
rs=con.execute("""select execute_shell_command_py3('cat /build_info/packages/flavor_customization/python3_pip_packages', true)""")
for r in rs:
    if r[0] is None:
        print()
    else:
        print(r[0])

xgboost|1.3.3
scikit-learn|0.24.2


### Testing the new package

After we made sure that the required packages are installed, we need to try importing and using it. Importing is usually a good first test if a package got successfully installed, because often you might already get errors at this step. However, sometimes you only will recognize errors when using the package. We recommend to have a test suite for each new package to check if it works properly before you start your UDF development. It is usually easier to debug problems if you have very narrow tests.

In [101]:
con = connect()

con.execute(textwrap.dedent("""
CREATE OR REPLACE PYTHON3_DS SET SCRIPT test_xgboost(i integer)
EMITS (o VARCHAR(2000000)) AS

def run(ctx):
    import xgboost
    import sklearn 
    
    ctx.emit("success")
/
"""))

rs = con.execute("select test_xgboost(1)")
rs.fetchall()

[('success',)]

In [100]:
con = connect()

con.execute(textwrap.dedent("""
CREATE OR REPLACE PYTHON3_DS SET SCRIPT test_xgboost(i integer)
EMITS (o1 DOUbLE, o2 DOUbLE, o3 DOUbLE) AS

def run(ctx):
    import pandas as pd
    import xgboost as xgb
    from sklearn import datasets
    from sklearn.model_selection import train_test_split
    
    iris = datasets.load_iris()
    X = iris.data
    y = iris.target
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    dtrain = xgb.DMatrix(X_train, label=y_train)
    dtest = xgb.DMatrix(X_test, label=y_test)
    param = {
        'max_depth': 3,  # the maximum depth of each tree
        'eta': 0.3,  # the training step for each iteration
        'silent': 1,  # logging mode - quiet
        'objective': 'multi:softprob',  # error evaluation for multiclass training
        'num_class': 3  # the number of classes that exist in this datset
        }
    num_round = 20  # the number of training iterations
    bst = xgb.train(param, dtrain, num_round)
    preds = bst.predict(dtest)
    
    ctx.emit(pd.DataFrame(preds))
/
"""))

con.export_to_pandas("select test_xgboost(1)")

Unnamed: 0,O1,O2,O3
0,0.006721661,8.975322e-18,1.392795e-22
1,2.01084e-22,1.084088e-20,0.007200644
2,5.305263e-21,0.006510653,2.1463e-17
3,3.593958e-21,0.003872168,8.624799e-13
4,0.006942478,7.210853e-19,1.447298e-22
5,8.427893e-22,0.007213292,2.786567e-21
6,2.8754660000000004e-17,2.757441e-12,0.003289814
7,1.3415739999999999e-20,0.004610729,8.480951e-14
8,8.427893e-22,0.007213292,2.786567e-21
9,6.325119e-17,6.13092e-12,0.002936984
