# Test the new Script-Languages-Container

This notebooks shows how to:
- activate the new script-languages-container in the Exasol database
- create UDF's for the new script-languages-container
- run those UDF's

## Setup
### Open Secure Configuration Storage

In [None]:
%run ../utils/access_store_ui.ipynb
display(get_access_store_ui('../'))

### Instantiate SLCT Manager

We create an instance of the SlctManager class from the notebook connector. SLCT Manager, which stands for "Script-Languages-Container-Tools" Manager. 
This class has some utility function which simplifies the use of the exaslct API.

In [None]:
from exasol.nb_connector import slct_manager
slctmanager = slct_manager.SlctManager(ai_lab_config)

## Use the new Script-Languages-Container

### Connect to the database and activate the container
Once you have a connection to the database you can run either the ALTER SESSION statement or ALTER SYSTEM statement. The latter statement will activate the container permanently and globally.
The `notebook` connector package provides a utility method, for creating an `pyexasol` connection and applying the `ALTER SESSION` command for all registered languages:

In [None]:
from exasol.nb_connector.language_container_activation import open_pyexasol_connection_with_lang_definitions

conn = open_pyexasol_connection_with_lang_definitions(ai_lab_config, compression=True)
conn.execute("CREATE SCHEMA SLC_TUTORIAL")

### Check if your customization did work

You first create a helper UDF which allows you to run arbitrary shell commands inside of a UDF instance. With that you can easily inspect the container.

In [None]:
import textwrap

conn.execute(textwrap.dedent(f"""
CREATE OR REPLACE {slctmanager.language_alias} SCALAR SCRIPT execute_shell_command_py3(command VARCHAR(2000000), split_output boolean)
EMITS (lines VARCHAR(2000000)) AS
import subprocess

def run(ctx):
    try:
        p = subprocess.Popen(ctx.command,
                             stdout    = subprocess.PIPE,
                             stderr    = subprocess.STDOUT,
                             close_fds = True,
                             shell     = True)
        out, err = p.communicate()
        if isinstance(out,bytes):
            out=out.decode('utf8')
        if ctx.split_output:
            for line in out.strip().split('\\n'):
                ctx.emit(line)
        else:
            ctx.emit(out)
    finally:
        if p is not None:
            try: p.kill()
            except: pass
/
"""))

Check with "pip list" if the "xgboost" package is installed
We use our helper UDF to run `python3 -m pip list` directly in the container and get the list of currently available python3 packages.

In [None]:
rs=conn.execute("""select execute_shell_command_py3('python3 -m pip list', true)""")
for r in rs: 
    print(r[0])

Running `pip list` inside the container displays the available packages. In case of unexpected results, please have a look at the information stored by `exaslct` during build-time inside the container.

#### Embedded Build Information of the Container
Here we see an overview about the build information which `exaslct` embedded into the container. `exaslct` stores all packages lists (as defined in the flavor and what actually got installed), the final Dockerfiles and the image info. The image info describes how the underlying Docker images of the container got built. The build information is stored in the `/build_info` directory in the container. You can use again our helper UDF to inspect the build information.

In [None]:
rs=conn.execute("""select execute_shell_command_py3('find /build_info', true)""")
for r in rs: 
    print(r[0])

Now you could examine the python3 pip packages file, which was created directly after building the container image by `exaslct`.

In [None]:
rs=conn.execute("""select execute_shell_command_py3('cat /build_info/actual_installed_packages/release/python3_pip_packages', true)""")
for r in rs: 
    print(r[0])

All your packages from the flavor-customization build step should be included. If you want to double check this, you can run:

In [None]:
rs=conn.execute("""select execute_shell_command_py3('cat /build_info/packages/flavor_customization/python3_pip_packages', true)""")
for r in rs:
    if r[0] is None:
        print()
    else:
        print(r[0])

### Testing the new package

After you made sure that the required packages are installed, you need to try importing and using them. Importing is usually a good first test if a package got successfully installed, because often you might already get errors at this step. However, sometimes you only will recognize errors when using the package. We recommend to have a test suite for each new package to check if it works properly before you start your UDF development. It is usually easier to debug problems if you have very narrow tests.

In [None]:
conn.execute(textwrap.dedent(f"""
CREATE OR REPLACE {slctmanager.language_alias} SET SCRIPT test_xgboost(i integer)
EMITS (o VARCHAR(2000000)) AS

def run(ctx):
    import xgboost
    import sklearn 
    
    ctx.emit("success")
/
"""))

rs = conn.execute("select test_xgboost(1)")
rs.fetchall()

Finally, import and use the new packages. The following UDF uses the `xgboost` and `sklearn` modules to solve a small machine learning problem.

In [None]:
conn.execute(textwrap.dedent(f"""
CREATE OR REPLACE {slctmanager.language_alias} SET SCRIPT test_xgboost(i integer)
EMITS (o1 DOUbLE, o2 DOUbLE, o3 DOUbLE) AS

def run(ctx):
    import pandas as pd
    import xgboost as xgb
    from sklearn import datasets
    from sklearn.model_selection import train_test_split
    
    iris = datasets.load_iris()
    X = iris.data
    y = iris.target
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    dtrain = xgb.DMatrix(X_train, label=y_train)
    dtest = xgb.DMatrix(X_test, label=y_test)
    param = {{
        'max_depth': 3,  # the maximum depth of each tree
        'eta': 0.3,  # the training step for each iteration
        'silent': 1,  # logging mode - quiet
        'objective': 'multi:softprob',  # error evaluation for multiclass training
        'num_class': 3  # the number of classes that exist in this datset
        }}
    num_round = 20  # the number of training iterations
    bst = xgb.train(param, dtrain, num_round)
    preds = bst.predict(dtest)
    
    ctx.emit(pd.DataFrame(preds))
/
"""))

conn.export_to_pandas("select test_xgboost(1)")