# First Steps

This notebook guides you through the first steps using Jupyter notebooks with Exasol.

The notebook demonstrates connecting to an Exasol database instance and using some of its features. 

## 1. Open Secure Configuration Storage

First we need to open the Secure Configuration Storage (SCS) containing the connection information such as the database host, user, password, etc.

In [None]:
%run utils/access_store_ui.ipynb
display(get_access_store_ui())

## 2. Using JupySQL

First we will activate the [JupySQL](https://jupysql.ploomber.io) magics:

In [None]:
%run utils/jupysql_init.ipynb

In the background JupySQL uses SQLAlchemy, see also the [demo section on SQLAlchemy](#4.-SQLAlchemy) below.

### 2.1 Create Database Tables

We will use JupySQL to create 2 database tables but other sections will use the tables, too:

In [None]:
%%sql
CREATE
OR REPLACE TABLE US_AIRLINES (
  OP_CARRIER_AIRLINE_ID DECIMAL(10, 0) IDENTITY PRIMARY KEY,
  CARRIER_NAME VARCHAR(1000)
)

In [None]:
%%sql
CREATE
OR REPLACE TABLE US_FLIGHTS (
  FL_DATE TIMESTAMP, -- was DATE
  OP_CARRIER_AIRLINE_ID DECIMAL(10, 0),
  ORIGIN_AIRPORT_SEQ_ID DECIMAL(10, 0),
  ORIGIN_STATE_ABR CHAR(2),
  DEST_AIRPORT_SEQ_ID DECIMAL(10, 0),
  DEST_STATE_ABR CHAR(2),
  CRS_DEP_TIME CHAR(4),
  DEP_DELAY DOUBLE, -- was DECIMAL(6,2)
  CRS_ARR_TIME CHAR(4),
  ARR_DELAY DOUBLE, -- was DECIMAL(6,2)
  CANCELLED BOOLEAN,
  CANCELLATION_CODE CHAR(1),
  DIVERTED BOOLEAN,
  CRS_ELAPSED_TIME DOUBLE, -- was DECIMAL(6,2)
  ACTUAL_ELAPSED_TIME DOUBLE, -- was DECIMAL(6,2)
  DISTANCE DOUBLE, -- was DECIMAL(6,2)
  CARRIER_DELAY DOUBLE, -- was DECIMAL(6,2)
  WEATHER_DELAY DOUBLE, -- was DECIMAL(6,2)
  NAS_DELAY DOUBLE, -- was DECIMAL(6,2)
  SECURITY_DELAY DOUBLE, -- was DECIMAL(6,2)
  LATE_AIRCRAFT_DELAY DOUBLE -- was DECIMAL(6,2)
)

### 2.2 Importing CSV Files From Remote

This section demonstrates how to import CSV files from a remote source into the database.

First we will import a list of US airlines. The data is publicly accessible at the [Bureau of Transportation Statistics](https://www.transtats.bts.gov/Homepage.asp) of the US Department of Transportation.

In [None]:
%%sql
IMPORT INTO US_AIRLINES FROM
    CSV AT 'https://dut5tonqye28.cloudfront.net/ai_lab/flight-info/' 
    FILE 'US_AIRLINES.csv' 
    COLUMN SEPARATOR = ',' 
    ROW SEPARATOR = 'CRLF'
    COLUMN DELIMITER = '"' 
    SKIP = 1

Next, we will import data about flights in February 2024:

In [None]:
%%sql
IMPORT INTO US_FLIGHTS 
    FROM CSV AT 'https://dut5tonqye28.cloudfront.net/ai_lab/first_steps/' 
    FILE 'US_FLIGHTS_FEB_2024-fixed-booleans.csv'
    (1 FORMAT = 'MM/DD/YYYY HH12:MI:SS AM', 2..21)    
    SKIP = 1

Let's find out which is the airline with the highest delay per flight:

In [None]:
%%sql --save udf_output
SELECT
  CARRIER_NAME "Airline",
  SUM(CARRIER_DELAY) "Combined Delay",
  COUNT(CARRIER_DELAY) "Delayed Flights",
  COUNT(F.OP_CARRIER_AIRLINE_ID) "Total flights",
  ROUND( SUM(CARRIER_DELAY) / COUNT(F.OP_CARRIER_AIRLINE_ID), 1 ) "Delay per flight"
FROM US_FLIGHTS F
  JOIN US_AIRLINES A ON A.OP_CARRIER_AIRLINE_ID = F.OP_CARRIER_AIRLINE_ID
WHERE NOT (CANCELLED OR DIVERTED)
GROUP BY CARRIER_NAME
ORDER BY "Delay per flight" DESC

### 2.3 Importing a Parquet File From an AWS S3 Bucket

This demo uses a file already uploaded to S3 bucket `ai-lab-example-data-s3`.

**Please note**: Parquet import requires using **<span style="color: #40a">Exasol version 2025 or higher</span>**, see [docs.exasol.com](https://docs.exasol.com/db/latest/loading_data/load_data_parquet.htm).

First we will define a connection pointing to the S3 bucket:

In [None]:
%%sql
CREATE
OR REPLACE CONNECTION AI_LAB_FIRST_STEPS_S3 TO 'https://ai-lab-example-data-s3.s3.eu-central-1.amazonaws.com'

Alternatively the connection can also use the following URL syntax, see also "_Load data from Parquet files_" on [docs.exasol.com](https://docs.exasol.com/db/latest/loading_data/load_data_parquet.htm#Overview):

In [None]:
%%sql
CREATE
OR REPLACE CONNECTION AI_LAB_FIRST_STEPS_S3 TO 's3://ai-lab-example-data-s3'

Then we will remove the data imported before:

In [None]:
%%sql
TRUNCATE TABLE US_FLIGHTS

Now we can import the Parquet file from S3 into the database:

In [None]:
%%sql 
IMPORT INTO US_FLIGHTS FROM PARQUET AT AI_LAB_FIRST_STEPS_S3 
    FILE 'first_steps/US_FLIGHTS_FEB_2024.parquet'

We will query table `US_FLIGHTS` again to display the imported data:

In [None]:
%%sql SELECT * FROM US_FLIGHTS

## 3. PyExasol

Please note
* Accessing Exasol database versions `2025` and higher requires Pyexasol version â‰¥ `1.2`.
* AI Lab currently is shipped with pyexasol version `0.27.0`, the Pyexasol examples can only be executed with Exasol database versions < `2025`.

### 3.1 Importing a CSV File From the Local Filesystem

This section demonstrates how to import a CSV file from the local file system into the database using pyexasol.

Function `open_pyexasol_connection()` opens a connection, using the configuration from the SCS.

In [None]:
from pathlib import Path
from exasol.nb_connector.connections import open_pyexasol_connection

with open_pyexasol_connection(ai_lab_config, compression=True) as conn:
    path = Path("first_steps/pyexasol.csv")
    import_params = {
        "column_delimiter": '"',
        "column_separator": ",",
        "row_separator": "CRLF",
        "skip": 1,
    }
    conn.import_from_file(path, (ai_lab_config.db_schema, "US_AIRLINES"), import_params)

Let's verify successful import:

In [None]:
%%sql
SELECT * FROM US_AIRLINES WHERE CARRIER_NAME LIKE '% local CSV file via pyexasol'

### 3.2 Importing a CSV File From Remote

This section demonstrates how to import a CSV file from a remote source into the database using pyexasol.

Then we will truncate table `US_FLIGHTS` to be able to import the flight data again:

In [None]:
%%sql
TRUNCATE TABLE US_FLIGHTS

Now let's run the import via pyexasol. This example uses a query with separate params and [pyexasol SQL formatting](https://exasol.github.io/pyexasol/master/user_guide/exploring_features/formatting_sql.html): 

In [None]:
from exasol.nb_connector.connections import open_pyexasol_connection

query = """
    IMPORT INTO {flights_table!q} FROM CSV AT {url!s} FILE {file!s}
    (1 FORMAT = {date_format!s}, 2..21) 
    SKIP = 1
"""

params = {
    "flights_table": (ai_lab_config.db_schema, "US_FLIGHTS"),
    "url": "https://dut5tonqye28.cloudfront.net/ai_lab/first_steps/",
    "file": "US_FLIGHTS_FEB_2024-fixed-booleans.csv",
    "date_format": "MM/DD/YYYY HH12:MI:SS AM",
}

with open_pyexasol_connection(ai_lab_config, compression=True) as conn:
    result = conn.execute(query, params)

print(f"Imported {result.rowcount()} rows.")

We will assign the formatted SQL query to a variable for reusing it later on:

In [None]:
import_remote_csv_sql = result.query

## 4. SQLAlchemy

Function `open_sqlalchemy_connection()` returns a SQLAlchemy engine, again using the configuration from the SCS.
This engine will be used by the examples based on SQLAlchemy.

In [None]:
from exasol.nb_connector.connections import open_sqlalchemy_connection
engine = open_sqlalchemy_connection(ai_lab_config)

### 3.1 Importing a CSV File From Remote

This section demonstrates how to import a CSV file from a remote source into the database using SQLAlchemy.

First we will truncate table `US_FLIGHTS` again:

In [None]:
%%sql
TRUNCATE TABLE US_FLIGHTS

Next we will import the flight data once again, now using SQLAlchemy:

In [None]:
# reusing variable import_remote_csv_sql defined in pyexasol example above

with engine.connect() as conn:
    result = conn.execute(import_remote_csv_sql)
print(f"Imported {result.rowcount} rows.")

### 3.3 Importing a Parquet File from an AWS S3 Bucket

This section demonstrates how to import a CSV file from an AWS S3 Bucket into the database using SQLAlchemy.

**Please note**: Parquet import requires using **<span style="color: #40a">Exasol version 2025 or higher</span>**, see [docs.exasol.com](https://docs.exasol.com/db/latest/loading_data/load_data_parquet.htm).

First we will truncate table `US_FLIGHTS` again:

In [None]:
%%sql
TRUNCATE TABLE US_FLIGHTS

Now we can import the Parquet file from S3 into the database:

In [None]:
sql = """
IMPORT INTO US_FLIGHTS
    FROM PARQUET AT AI_LAB_FIRST_STEPS_S3
    FILE 'first_steps/US_FLIGHTS_FEB_2024.parquet'
"""

with engine.connect() as conn:
    result = conn.execute(sql)
print(f"Imported {result.rowcount} rows.")

### 3.4 SQLAlchemy Query Builders

This section demonstrates using SQLAlchemy features [text expression](https://docs.sqlalchemy.org/en/20/core/sqlelement.html#sqlalchemy.sql.expression.text) and [TextClause](https://docs.sqlalchemy.org/en/20/core/sqlelement.html#sqlalchemy.sql.expression.TextClause) to build a query, execute it and iterate the result set:

In [None]:
from sqlalchemy import text
from datetime import datetime
from sqlalchemy.types import DateTime, String

t = (
    text(
        """
        SELECT FL_DATE, CRS_DEP_TIME 
        FROM US_FLIGHTS 
        WHERE OP_CARRIER_AIRLINE_ID =:carrier_id
        AND ORIGIN_STATE_ABR=:origin
        AND DEST_STATE_ABR=:dest
        """
    )
    .bindparams(carrier_id=20452, origin="TX", dest="NJ")
    .columns(FL_DATE=DateTime, CRS_DEP_TIME=String)
)

with engine.connect() as conn:
    for dt, departure in conn.execute(t):
        print(datetime.fromisoformat(dt).date(), departure)

## 5. Using the Exasol Bucket File System

The [Exasol Bucket File System](https://docs.exasol.com/db/latest/database_concepts/bucketfs/bucketfs.htm) (BucketFS) is a powerful feature for exchanging non-relational data with the database nodes in an Exasol cluster.

Such data can be arbitrary files including 
* Data to be processed by [User Defined Scripts](https://docs.exasol.com/db/latest/database_concepts/udf_scripts.htm) (UDFs)
* [Script-Language Containers](https://github.com/exasol/script-languages-release) (SLCs)
* Pretrained Large Language AI Models

### 5.1 Uploading a File to the BucketFS

First we will create a sample file:

In [None]:
%%writefile first_steps/text_file.txt
Hello World!

And now, let's upload the file into the BucketFS.

Function `open_bucketfs_location()` returns a cursor into Exasols BucketFS, also using the configuration in the SCS.

In [None]:
from exasol.nb_connector.connections import open_bucketfs_location
from pathlib import Path 

file = Path("first_steps/text_file.txt")
bfs = open_bucketfs_location(ai_lab_config)
remote = bfs / file.name
remote.write(file.read_bytes())

### 5.2 Listing the Files in the BucketFS

We can also list all the files currently available in the BucketFS:

In [None]:
bfs = open_bucketfs_location(ai_lab_config)
for p in bfs.iterdir():
    print(f'- {p.name}')

### 5.3 Reading a File in the BucketFS

We can also read the contents of a file in the BucketFS:

In [None]:
import exasol.bucketfs as bfs

content = bfs.as_string(remote.read())
print(f'The file in the BucketFS contains:\n{content}')

### 5.4 Reading the File in the BucketFS Using a User Defined Function (UDF)

[TODO]