<img src="https://assets-global.website-files.com/620d42e86cb8ecb3f739e579/620d44bba9bc9541593ef7bc_website%20header.png" alt="HeavyAI" width="250"/>

<big><big><h1>+</h1></big></big>

![SQLAlchemy](https://www.sqlalchemy.org/img/sqla_logo.png)


# Getting started with SQLAlchemy HeavyAI


**HeavyDB** is the world's fastest open source **SQL** engine,
equally powerful at the heart of the HeavyAI platform as it is accelerating
third-party analytic apps. It optimizes the memory and computes layers to deliver unprecedented 
performance. **HeavyDB** was designed to keep hot data in **GPU** memory for the
fastest access possible. Other **GPU** database systems have taken the approach 
of storing the data in **CPU** memory, only moving it to **GPU** at query time, 
trading the gains they receive from **GPU** parallelism with transfer overheads 
over the **PCIe** bus.

**HeavyDB** avoids this transfer inefficiency by caching the most recently
touched data in High Bandwidth Memory on the **GPU**, which offers up to 10x 
the bandwidth of **CPU DRAM** and far lower latency. **HeavyDB** is also
designed to exploit efficient inter-GPU communication infrastructure such as 
**NVIDIA NVLink** when available.

For data manipulation, HeavyAI provides **heavyai**/**pyheavydb** but,
maybe you would like to use a more common and high level tool for your data workflow such as
[Ibis](https://github.com/ibis-project/ibis) or 
[SQLAlchemy](https://github.com/sqlalchemy/sqlalchemy). The great news is that
HeavyAI also provides a backend/dialect for both!

If you want more information about `ibis-heavyai` check its
[repository](https://github.com/heavyai/ibis-heavyai)

This tutorial is about the first steps with **SQLAlchemy HeavyAI**!

## Installation

**sqlalchemy-heavyai** is available on **PyPI** and **conda-forge** and you can
install it using one of the following commands:

```bash
# if you are a conda user (also work with mamba)
$ conda install -y sqlalchemy-heavyai
```

or

```bash
# if you are a pip user
$ pip install sqlalchemy-heavyai
```

`sqlalchemy-heavyai` is a `sqlalchemy` dialect, so you don't need to import
`sqlalchemy-heavyai` directly. Just import `sqlalchemy` and create a connection
using the following structure:

`heavydb://<user>:<pass>@<host>:<port>/<db>?protocol=<protocol>`


In [12]:
import sqlalchemy
from sqlalchemy import create_engine
import pandas as pd

sqlalchemy.__version__

'1.4.36'

In [14]:
engine = create_engine(
    "heavydb://admin:HyperInteractive@localhost:6274/heavyai?protocol=binary"
)

con = engine.connect()

As an example, let's work on the first table we can find in our database.

In [37]:
table_name = con.execute(f"SHOW TABLES").first()[0]
print(table_name)

lidar


For this first tutorial, we are going to use a table called `github`.

In [21]:
metadata = sqlalchemy.MetaData()

lidar = sqlalchemy.Table(
    f"{table_name}", 
    metadata, 
    autoload=True, 
    autoload_with=engine
)

And, inspecting the variable `github` it seems it is working!

In [22]:
lidar

Table('lidar', MetaData(), Column('x', FLOAT(), table=<lidar>), Column('y', FLOAT(), table=<lidar>), Column('z', FLOAT(), table=<lidar>), schema=None)

Now, let's try a simple query using `sqlalchemy` API:

In [23]:
query = sqlalchemy.select([f"{table_name}"]).limit(1)
str(query.compile())

'SELECT lidar.x, lidar.y, lidar.z \nFROM lidar\n LIMIT :param_1'

In [24]:
results = con.execute(query).fetchall()
results

  results = con.execute(query).fetchall()


[(-122.42375, 37.78875, 65.15439)]

We can also use **Pandas** to manipulate this result!

In [25]:
df = pd.DataFrame(results)
df.columns = results[0].keys()
df

Unnamed: 0,x,y,z
0,-122.42375,37.78875,65.15439


Also, if you are familiar with **SQL** you it directly:

In [8]:
results = con.execute(f"SELECT * FROM {table_name} LIMIT 1").fetchall()
results

[('IssueCommentEvent', 1, 16635032, 'start-jsk/rtmros_hironx', 'https://api.github.com/repos/start-jsk/rtmros_hironx', 1840401, 130, None, 'https://avatars.githubusercontent.com/u/1840401?', 'https://api.github.com/users/130s', 2988053, 'start-jsk', None, 'https://avatars.githubusercontent.com/u/2988053?', 'https://api.github.com/orgs/start-jsk', datetime.datetime(2015, 1, 1, 0, 30, 4), '2489383075')]

### Filtering

The **sqlalchemy** API is very similar to **SQL** structure. For example,
to execute a SQL `SELECT` you can use `sqalchemy.select` that returns an object
that allows other **SQL** clauses, such as `limit`, `where`, etc.

Let's try to filter our dataset with the first 10 records with `type PushEvent`.

And it will be translated to **SQL**:

In [26]:
print(query.compile())

SELECT lidar.x, lidar.y, lidar.z 
FROM lidar
 LIMIT :param_1


As you can see in the output above, it uses some "variables" instead of the real values. If you want see the real **SQL**, use the following command:

In [27]:
print(query.compile(engine, compile_kwargs={"literal_binds": True}))

SELECT lidar.x, lidar.y, lidar.z 
FROM lidar 
 LIMIT 1


### Using with Pandas

If you managed to read until here, probably you also have already heard about [Pandas](https://pandas.pydata.org/),
the most popular **Data Frame** library for **Python**. As **sqlalchemy-heavyai** is a **sqlalchemy** dialect,
you also can use it directly with **Pandas**:


In [28]:
pd.read_sql(query, engine)

Unnamed: 0,x,y,z
0,-122.42375,37.78875,65.15439


Or, if you want to use **SQL** directly:

In [31]:
sql = f"SELECT * FROM {table_name} LIMIT 1"
pd.read_sql(sql, engine)

Unnamed: 0,x,y,z
0,-122.42375,37.78875,65.15439


## Conclusions

This document aims to help users to start with **SQLAlchemy HeavyAI** and it doesn't provide an exhausted list of functions or
possibilities. 

For more information about **SQLAlchemy**, check its [official tutorials](https://docs.sqlalchemy.org/en/14/orm/tutorial.html).