# Connect to Exasol from AzureML

In this Tutorial we will:
 - Connect to Exasol SaaS from AzureML
 - Export Exasol tables to an Azure Blobstore Container
 - Create a Datastore


## Prerequisites

You will need:
 - your running Exasol Saas Cluster with your data loaded into it
 - authentication info for your Exasol Saas Cluster
 - an AzureML account and Azure Storage account
 - AzureML set up with a:
    - workspace
    - compute instance


## Why using Azure blobstorage is necessary

In this tutorial we copy the data from an Exasol Saas database into an Azure Blobstorage Container. This is necessary because while AzureML hase functionality to import directly from SQL databases, the Exasol SQL dialect is not supported by AzureML at the moment of writing.


## AzureML setup

If you do not know how to set up your AzureML studio, please refer to the [AzureML documentation](https://learn.microsoft.com/en-us/azure/machine-learning/quickstart-create-resources).
Once you are set up with a workspace and compute instance, you can copy this notebook into your notebook files. Open it and Select your compute instance in the drop-down menu at the top of your notebook. Now we can get started with connecting to the Exasol Saas cluster.


### Connect to Exasol Saas


We are going to use the [PyExasol](https://docs.exasol.com/db/latest/connect_exasol/drivers/python/pyexasol.htm) package in order to connect to the Exasol database and read the data. Fist we need to install PyExasol.
Execute these steps inside your AzureML Compute.

In [None]:
!pip install pyexasol

Then we need to connect with PyExasol to our Exasol Saas Cluster with the data. Change these values to reflect your Cluster.
We ask for 10 lines of our "IDA.TEST" table from the [Scania Trucks](https://archive.ics.uci.edu/ml/datasets/IDA2016Challenge) to check if our connection is working.

In [None]:
import pyexasol
import pandas

EXASOL_HOST = "<your>.clusters.exasol.com"      # change
EXASOL_PORT = "8563"                            # change if needed
EXASOL_USER = "<your-exasol-user>"              # change
EXASOL_PASSWORD = "exa_pat_<your_password>"     # change
EXASOL_SCHEMA = "IDA"                           # change if needed

# get the connection
EXASOL_CONNECTION = "{host}:{port}".format(host=EXASOL_HOST, port=EXASOL_PORT)
exasol = pyexasol.connect(dsn=EXASOL_CONNECTION, user=EXASOL_USER, password=EXASOL_PASSWORD, compression=True)

# check if the connection is working
data = exasol.export_to_pandas("SELECT * FROM TABLE IDA.TEST LIMIT 10")
print(data)


### Load data into AzureML Blobstore


For this step, we need to access the Azure Storage Account. For that you need to insert your Azure storage account name and access key. To find your access key, in the Azure portal navigate to your storage account, and click on "Access Keys" under "Security + networking" and copy one of your access Keys.

![](img_src/access_key_azure.png)


In [None]:
from azure.ai.ml.entities import AccountKeyConfiguration

my_storage_account_name = "your_storage_account_name"   # change
credentials= AccountKeyConfiguration(
        account_key="your_storage_account_key"          # change
    )

Lastly we use an "EXPORT TABLE" command for each of our data tabled to export them  into a csv file in our Blobstorage using "INTO CSV AT CLOUD AZURE BLOBSTORAGE". You can find [the domumentation for this export command](https://docs.exasol.com/db/latest/sql/export.htm) in the Exasol documentation.
If you choose an existing "azure_storage_container_name", this command will save your files in this container. Otherwise, an azure storage container with that name will be created automatically.
When you created your AzureML workspace, an Azure blob container was [created automatically](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data) and added as a Datastore named "workspaceblobstore" to your workspace. You can use it here and then scip the "Create a Datastore" step below if you want. For this you would need to find its name ("azureml-blobstore-some-ID") in the datastore info and insert it here.

In [None]:
azure_storage_container_name = "your-container-name"   # change, remember to you might need to remove the "_datastore" suffix

for table in ["TEST", "TRAIN"]:
    save_path = f'{azure_storage_container_name}/ida/{table}'
    sql_export = f"EXPORT TABLE IDA.{table} INTO CSV AT CLOUD AZURE BLOBSTORAGE 'DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net'"\
                f"USER '{my_storage_account_name}' IDENTIFIED BY '{credentials.account_key}' FILE '{save_path}'"
    exasol.execute(sql_export)
    print(f"saved {table} in file {save_path}")


You can check the success of the command by navigating to your Container in the Azure portal using your Azure storage account.
In menu on left find "Containers" under "data storage". Find the container named "your-container-name" and click on it. Your files should be there.


### Create a Datastore

We recommend you create a connection between your azure storage container and your AzureML workspace. For this, In AzureML studio enter your workspace and select "DAta" under "Assets" in the menu on the left. Now select "Datastores" and click on "+Create".

![](img_src/create_datastore.png)

In the view that opens you need to enter the info for your datastore. Enter a name and select the type as "Azure Blob Storage". Then select your Azure subscription and the blob container we loaded the data into from he drop down menu. Use Authentiovation type Account key and enter your Azure storage account access key. Click create.

![](img_src/data_blobstore.png)

You can now see your data directly in AzureML by navigating to Data -> Datastores -> <your_datastore_name> . If you then change into the "Browse" view you can open your files and have a look at them if you want.


Great, we successfully connected to our Exasol Saas instance and loaded data from there into our Azure Blobstorage!

Now we move on to [working with the data in AzureML and training a model on it](TrainModelInAzureML.ipynb).