![OmniSci](https://assets-global.website-files.com/5deb974b5176872b2c106aba/5dee79635d7b1979b584c100_24px%20blue-p-500.png)

<big><big><h1>+</h1></big></big>

![SQLAlchemy](https://www.sqlalchemy.org/img/sqla_logo.png)


# Getting started with SQLAlchemy OmniSci


**OmniSciDB** is the world's fastest open source **SQL** engine, 
equally powerful at the heart of the OmniSci platform as it is accelerating 
third-party analytic apps. It optimizes the memory and computes layers to deliver unprecedented 
performance. **OmniSciDB** was designed to keep hot data in **GPU** memory for the 
fastest access possible. Other **GPU** database systems have taken the approach 
of storing the data in **CPU** memory, only moving it to **GPU** at query time, 
trading the gains they receive from **GPU** parallelism with transfer overheads 
over the **PCIe** bus.

**OmniSciDB** avoids this transfer inefficiency by caching the most recently 
touched data in High Bandwidth Memory on the **GPU**, which offers up to 10x 
the bandwidth of **CPU DRAM** and far lower latency. **OmniSciDB** is also 
designed to exploit efficient inter-GPU communication infrastructure such as 
**NVIDIA NVLink** when available.

For data manipulation, OmniSci provides **pyomnisci**/**pyomniscidb** but,
maybe you would like to use a more common and high level tool for your data workflow such as
[Ibis](https://github.com/ibis-project/ibis) or 
[SQLAlchemy](https://github.com/sqlalchemy/sqlalchemy). The great news is that
OmniSci also provides a backend/dialect for both!

If you want more information about `ibis-omniscidb` check its 
[repository](https://github.com/omnisci/ibis-omniscidb)

This tutorial is about the first steps with **SQLAlchemy OmniSci**!

## Installation

**sqlalchemy-omnisci** is available on **PyPI** and **conda-forge** and you can 
install it using one of the following commands:

```bash
# if you are a conda user
$ conda install -y sqlalchemy-omnisci
```

or

```bash
# if you are a pip user
$ pip install sqlalchemy-omnisci
```

`sqlalchemy-omnisci` is a `sqlalchemy` dialect, so you don't need to import 
`sqlalchemy-omnisci` directly. Just import `sqlalchemy` and create a connection 
using the following structure:

`omnisci://<user>:<pass>@<host>:<port>/<db>?protocol=<protocol>`


In [1]:
import sqlalchemy
from sqlalchemy import create_engine
import pandas as pd

sqlalchemy.__version__

'1.4.25'

In [2]:
engine = create_engine(
    "omnisci://demouser:HyperInteractive@"
    "metis.mapd.com:443/mapd?protocol=https"
)

con = engine.connect()

For this first tutorial, we are going to use a table called `github`.

In [3]:
metadata = sqlalchemy.MetaData()

github = sqlalchemy.Table(
    'github', 
    metadata, 
    autoload=True, 
    autoload_with=engine
)

And, inspecting the variable `github` it seems it is working!

In [4]:
github

Table('github', MetaData(), Column('type', VARCHAR(length=52), table=<github>), Column('public_', BOOLEAN(), table=<github>), Column('repo_id', BIGINT(), table=<github>), Column('repo_name', VARCHAR(length=52), table=<github>), Column('repo_url', VARCHAR(length=52), table=<github>), Column('actor_id', BIGINT(), table=<github>), Column('actor_login', BIGINT(), table=<github>), Column('actor_gravatar_id', VARCHAR(length=52), table=<github>), Column('actor_avatar_url', VARCHAR(length=52), table=<github>), Column('actor_url', VARCHAR(length=52), table=<github>), Column('org_id', BIGINT(), table=<github>), Column('org_login', VARCHAR(length=52), table=<github>), Column('org_gravatar_id', VARCHAR(length=52), table=<github>), Column('org_avatar_url', VARCHAR(length=52), table=<github>), Column('org_url', VARCHAR(length=52), table=<github>), Column('created_at', TIMESTAMP(), table=<github>), Column('id', VARCHAR(length=52), table=<github>), schema=None)

Now, let's try a simple query using `sqlalchemy` API:

In [5]:
query = sqlalchemy.select([github]).limit(1)
str(query.compile())

'SELECT github.type, github.public_, github.repo_id, github.repo_name, github.repo_url, github.actor_id, github.actor_login, github.actor_gravatar_id, github.actor_avatar_url, github.actor_url, github.org_id, github.org_login, github.org_gravatar_id, github.org_avatar_url, github.org_url, github.created_at, github.id \nFROM github\n LIMIT :param_1'

In [6]:
results = con.execute(query).fetchall()
results

[('PushEvent', True, 13599170, 'ile/ile.github.io', 'https://api.github.com/repos/ile/ile.github.io', 433707, None, None, 'https://avatars.githubusercontent.com/u/433707?', 'https://api.github.com/users/ile', None, None, None, None, None, datetime.datetime(2015, 1, 1, 14, 23, 48), '2489636048')]

We can also use **Pandas** to manipulate this result!

In [7]:
df = pd.DataFrame(results)
df.columns = results[0].keys()
df

Unnamed: 0,type,public_,repo_id,repo_name,repo_url,actor_id,actor_login,actor_gravatar_id,actor_avatar_url,actor_url,org_id,org_login,org_gravatar_id,org_avatar_url,org_url,created_at,id
0,PushEvent,True,13599170,ile/ile.github.io,https://api.github.com/repos/ile/ile.github.io,433707,,,https://avatars.githubusercontent.com/u/433707?,https://api.github.com/users/ile,,,,,,2015-01-01 14:23:48,2489636048


Also, if you are familiar with **SQL** you it directly:

In [8]:
results = con.execute("SELECT * FROM github LIMIT 1").fetchall()
results

[('IssueCommentEvent', 1, 16635032, 'start-jsk/rtmros_hironx', 'https://api.github.com/repos/start-jsk/rtmros_hironx', 1840401, 130, None, 'https://avatars.githubusercontent.com/u/1840401?', 'https://api.github.com/users/130s', 2988053, 'start-jsk', None, 'https://avatars.githubusercontent.com/u/2988053?', 'https://api.github.com/orgs/start-jsk', datetime.datetime(2015, 1, 1, 0, 30, 4), '2489383075')]

### Filtering

The **sqlalchemy** API is very similar to **SQL** structure. For example,
to execute a SQL `SELECT` you can use `sqalchemy.select` that returns an object
that allows other **SQL** clauses, such as `limit`, `where`, etc.

Let's try to filter our dataset with the first 10 records with `type PushEvent`.

And it will be translated to **SQL**:

In [9]:
print(query.compile())

SELECT github.type, github.public_, github.repo_id, github.repo_name, github.repo_url, github.actor_id, github.actor_login, github.actor_gravatar_id, github.actor_avatar_url, github.actor_url, github.org_id, github.org_login, github.org_gravatar_id, github.org_avatar_url, github.org_url, github.created_at, github.id 
FROM github
 LIMIT :param_1


As you can see in the output above, it uses some "variables" instead of the real values. If you want see the real **SQL**, use the following command:

In [10]:
print(query.compile(engine, compile_kwargs={"literal_binds": True}))

SELECT github.type, github.public_, github.repo_id, github.repo_name, github.repo_url, github.actor_id, github.actor_login, github.actor_gravatar_id, github.actor_avatar_url, github.actor_url, github.org_id, github.org_login, github.org_gravatar_id, github.org_avatar_url, github.org_url, github.created_at, github.id 
FROM github 
 LIMIT 1


### Using with Pandas

If you managed to read until here, probably you also have already heard about [Pandas](https://pandas.pydata.org/),
the most popular **Data Frame** library for **Python**. As **sqlalchemy-omnisci** is a **sqlalchemy** dialect,
you also can use it directly with **Pandas**:


In [11]:
pd.read_sql(query, engine)

Unnamed: 0,type,public_,repo_id,repo_name,repo_url,actor_id,actor_login,actor_gravatar_id,actor_avatar_url,actor_url,org_id,org_login,org_gravatar_id,org_avatar_url,org_url,created_at,id
0,PushEvent,True,28684236,toopay/toopay.github.io,https://api.github.com/repos/toopay/toopay.git...,534245,,,https://avatars.githubusercontent.com/u/534245?,https://api.github.com/users/toopay,,,,,,2015-01-01 10:52:04,2489561906


Or, if you want to use **SQL** directly:

In [12]:
sql = "SELECT * FROM github LIMIT 1"
pd.read_sql(sql, engine)

Unnamed: 0,type,public_,repo_id,repo_name,repo_url,actor_id,actor_login,actor_gravatar_id,actor_avatar_url,actor_url,org_id,org_login,org_gravatar_id,org_avatar_url,org_url,created_at,id
0,PushEvent,1,28669269,colbycheeze/kittens-api,https://api.github.com/repos/colbycheeze/kitte...,8884298,,,https://avatars.githubusercontent.com/u/8884298?,https://api.github.com/users/colbycheeze,,,,,,2015-01-01 01:19:40,2489403456


## Conclusions

This document aims to help users to start with **SQLAlchemy Omnisci** and it doesn't provide an exhausted list of functions or 
possibilities. 

For more information about **SQLAlchemy**, check its [official tutorials](https://docs.sqlalchemy.org/en/14/orm/tutorial.html).