## Loading package dependencies

`%pip install` commands need only be run once per JupyterHub session.
If you restart your JupyterHub server, they should be re-installed.

If you want to ensure you are running latest version of packages, use `pip install --upgrade ...`

Notebook dependencies may be pre-installed on custom notebook images in future iterations.

The cell below uses `%%capture` to capture the (usually long) output of pip installs in a variable,
so it does not fill your console with large output cells

In [None]:
%%capture pipoutput
%pip install trino python-dotenv pandas
%pip install sqlalchemy sqlalchemy-trino
%pip install osc-ingest-tools

## Load credentials
OS-Climate convention is to store credentials using the `dotenv` file `credentials.env`

This cell loads the environment variables defined in `credentials.env` and sets them in python's
standard `os.environ` environment variable dictionary.

You can obtain your own `credentials.env` file as part of the OS-climate onboarding process.
You can upload this file from your computer using the 'upload' button on the left of the Jupyter console.

Here we are using a standard OSC utility `load_credentials_dotenv()` that loads `credentials.env`.
It expects to find this file at the top-level directory in the file browsing menu on the left hand side
of the Jupyter console.

In [None]:
import osc_ingest_trino as osc
osc.load_credentials_dotenv()

## Connect to trino with sqlalchemy engine

The following cell establishes an `sqlalchemy` connection to trino

By default, `attach_trino_engine()` expects the following environment variables,
which should be supplied by the `credentials.env` file above:

```
TRINO_USER
TRINO_PASSWD
TRINO_HOST
TRINO_PORT
```

In [None]:
# connect to the Trino DB and return a sqlalchemy engine
engine = osc.attach_trino_engine()

## Obtain query results directly from sqlalchemy

The following cell shows an example of running a simple SQL query,
and obtaining the results directly from the `sqlalchemy` engine.

In [None]:
sql = """
show catalogs
"""
qres = engine.execute(sql)
qres.fetchall()

## Load an SQL query into pandas
The `pandas` library can read a sql query directly into a DataFrame
using an `sqlalchemy` engine, as shown in the following cell.

Note the use of `convert_dtypes()` to tell pandas to assess the data types of the columns.

In [None]:
import pandas as pd

sql = """
show catalogs
"""

# execute the SQL query and load it into a pandas dataframe
df = pd.read_sql(sql, engine)

# assess the column data types from their contents:
df = df.convert_dtypes()

# display the dataframe
df

## check the column data types

You can check the column types returned for your query using the `info` DataFrame method:

In [None]:
df.info(verbose=True)