# Connect to Exasol from AzureML

In this Tutorial we will:
 - Connect to Exasol SaaS from AzureML
 - Preprocess data
 - Export Exasol tables to an Azure Blobstore Container
 - Create a Datastore

## Prerequisites

You will need:
 - Your running Exasol Saas Cluster with your data loaded into it
 - Authentication information for your Exasol Saas Cluster
 - An AzureML account and Azure Storage account
 - AzureML set up with a:
    - Workspace
    - Compute instance

## Why using Azure blobstorage is necessary

In this tutorial we copy the data from an Exasol Saas database into an Azure Blobstorage Container. This is necessary because while AzureML has functionality to import directly from SQL databases, the Exasol SQL dialect is not supported by AzureML at the moment of writing.


## AzureML setup

If you do not know how to set up your AzureML studio, please refer to the [AzureML documentation](https://learn.microsoft.com/en-us/azure/machine-learning/quickstart-create-resources).
Once you are set up with a workspace and compute instance, you can copy this notebook into your notebook files. Open it and select your compute instance in the drop-down menu at the top of your notebook. Now we can get started with connecting to the Exasol Saas cluster.


### Connect to Exasol Saas


We are going to use the [PyExasol](https://docs.exasol.com/db/latest/connect_exasol/drivers/python/pyexasol.htm) package in order to connect to the Exasol database and read the data. First we need to install PyExasol using pip in your AzureML Compute.

In [None]:
!pip install pyexasol

Then we need to connect with PyExasol to our Exasol Saas Cluster with the data. Change these values to reflect your Cluster.
We ask for 10 lines of our "IDA.TEST" table from the [Scania Trucks](https://archive.ics.uci.edu/ml/datasets/IDA2016Challenge) to check if our connection is working.

In [None]:
import pyexasol
import pandas

EXASOL_HOST = "<your>.clusters.exasol.com"      # change
EXASOL_PORT = "8563"                            # change if needed
EXASOL_USER = "<your-exasol-user>"              # change
EXASOL_PASSWORD = "exa_pat_<your_password>"     # change
EXASOL_SCHEMA = "IDA"                           # change if needed

# get the connection
EXASOL_CONNECTION = "{host}:{port}".format(host=EXASOL_HOST, port=EXASOL_PORT)
exasol = pyexasol.connect(dsn=EXASOL_CONNECTION, user=EXASOL_USER, password=EXASOL_PASSWORD, compression=True)

# check if the connection is working
exasol.export_to_pandas("SELECT * FROM TABLE IDA.TEST LIMIT 10")

## Data Preprocessing
explanation
    - why here not in azure
    - what gets done

"There are two things we need to do:

    Split into train and validation data
    Replace CLASS column by a column with boolean values

For the split we add a column SPLIT that has a random value between 0 and 1, so we can partition the data by a condition on that column.

In addition, we replace the CLASS with the text values pos and neg by a new column CLASS_POS with boolean values."
     - mention test table to

In [None]:
all_columns = exasol.export_to_pandas("SELECT * FROM IDA.TRAIN LIMIT 1;")
column_names = list(all_columns)
column_names.remove("CLASS")
exasol.execute("""CREATE OR REPLACE TABLE IDA.TRAIN_PREPARED AS (
               SELECT RANDOM() AS SPLIT,
               (CLASS = 'pos') as CLASS_POS, {all_columns_except_class!q} FROM IDA.TRAIN)""",
               {"all_columns_except_class": column_names})



exasol.export_to_pandas("SELECT * FROM IDA.TRAIN_PREPARED LIMIT 4")

In [None]:
exasol.execute("""CREATE OR REPLACE TABLE IDA.TEST_PREPARED AS (
               SELECT
               (CLASS = 'pos') as CLASS_POS, {all_columns_except_class!q} FROM IDA.TEST)""",
               {"all_columns_except_class": column_names})



exasol.export_to_pandas("SELECT * FROM IDA.TEST_PREPARED LIMIT 4")


### Load data into AzureML Blobstore


For this step, we need to access the Azure Storage Account. For that you need to insert your Azure storage account name and access key. To find your access key, in the Azure portal navigate to your storage account, and click on "Access Keys" under "Security + networking" and copy one of your access Keys.

![](img_src/access_key_azure.png)


In [None]:
from azure.ai.ml.entities import AccountKeyConfiguration

my_storage_account_name = "your_storage_account_name"   # change
account_key="your_storage_account_key"                  # change

credentials= AccountKeyConfiguration(account_key)

Lastly, we use an "EXPORT TABLE" command for each of our data tables to export them  into a CSV file in our Blobstorage using "INTO CSV AT CLOUD AZURE BLOBSTORAGE". You can find [the domumentation for this export command](https://docs.exasol.com/db/latest/sql/export.htm) in the Exasol documentation.
If you choose an existing "azure_storage_container_name", this command will save your files in this container. Otherwise, an azure storage container with that name will be created automatically.
When you created your AzureML workspace, an Azure blob container was [created automatically](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data) and added as a Datastore named "workspaceblobstore" to your workspace. You can use it here and then scip the "Create a Datastore" step below if you want. For this you would need to find its name ("azureml-blobstore-some-ID") in the datastore info and insert it here.

## todo
- change and add explanation to preprocessing
- update image of loaded tables (reload without split column beforehand)
- add notze about selecing columns
- add note about importing more than once -> appends not make new file!

In [None]:
table = "TEST_PREPARED"
column_names = ['CLASS_POS', 'AA_000', 'AG_005', 'AH_000', 'AL_000', 'AM_0', 'AN_000', 'AO_000', 'AP_000', 'AQ_000',
                    'AZ_004', 'BA_002', 'BB_000', 'BC_000', 'BD_000', 'BE_000',
                    'BF_000', 'BG_000', 'BH_000', 'BI_000', 'BJ_000', 'BS_000', 'BT_000', 'BU_000', 'BV_000',
                    'BX_000', 'BY_000', 'BZ_000', 'CA_000', 'CB_000', 'CC_000', 'CI_000', 'CN_004', 'CQ_000',
                    'CS_001', 'DD_000', 'DE_000', 'DN_000', 'DS_000', 'DU_000', 'DV_000', 'EB_000', 'EE_005']

blobstorage_name = "azureml-tutorial"   # change, remember to you might need to remove the "_datastore" suffix
save_path = f'{blobstorage_name}/ida/{table}'
sql_export = "EXPORT (SELECT {column_names!q}" + f" FROM IDA.{table}) INTO CSV AT CLOUD AZURE BLOBSTORAGE 'DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net'"\
                f"USER '{my_storage_account_name}' IDENTIFIED BY '{credentials.account_key}' FILE '{save_path}' WITH COLUMN NAMES"
exasol.execute(sql_export, {"column_names": column_names})
print(f"saved {table} in file {save_path}")

In [None]:

table = "TRAIN_PREPARED"
save_path = f'{blobstorage_name}/ida/{table}'
sql_export = "EXPORT (SELECT {column_names!q}" + f" FROM IDA.{table} WHERE SPLIT <= 0.8) INTO CSV AT CLOUD AZURE BLOBSTORAGE 'DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net'"\
                f"USER '{my_storage_account_name}' IDENTIFIED BY '{credentials.account_key}' FILE '{save_path}' WITH COLUMN NAMES"
exasol.execute(sql_export, {"column_names": column_names})
print(f"saved {table} in file {save_path}")

save_path = f'{blobstorage_name}/ida/VALIDATE_PREPARED'
sql_export = "EXPORT (SELECT {column_names!q}" + f" FROM IDA.{table} WHERE SPLIT > 0.8) INTO CSV AT CLOUD AZURE BLOBSTORAGE 'DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net'"\
                f"USER '{my_storage_account_name}' IDENTIFIED BY '{credentials.account_key}' FILE '{save_path}' WITH COLUMN NAMES"
exasol.execute(sql_export, {"column_names": column_names})
print(f"saved {table} in file {save_path}")

In [None]:
for table in ["TRAIN_PREPARED", "TEST_PREPARED"]:
    exasol.execute(f"DROP TABLE IDA.{table};")

You can check the success of the command by navigating to your Container in the Azure portal using your Azure storage account.
In the menu on the left, you can find "Containers" under "Data Storage". Find the container named "your-container-name" and click on it. Your files should be there.


### Create a Datastore

We recommend that you create a connection between your Azure Storage Container and your AzureML Workspace. For this, enter your workspace in AzureML Studio and select "Data" under "Assets" in the menu on the left. Now select "Datastores" and click on "+Create".

![](img_src/create_datastore.png)

In the view that opens you need to enter the info for your datastore. Enter a name and select the type as "Azure Blob Storage". Then select your Azure subscription and the blob container we loaded the data into from the drop-down menu. Use Authentication type Account key and enter your Azure storage account access key. Click create.

![](img_src/data_blobstore.png)

You can now see your data directly in AzureML by navigating to "Datastores" and clicking on <your_datastore_name> . If you then change into the "Browse" view you can open your files and have a look at them if you want.


Great, we successfully connected to our Exasol Saas instance and loaded data from there into our Azure Blobstorage!

Now we move on to [working with the data in AzureML and training a model on it](TrainModelInAzureML.ipynb).