# ODSC Webinar: OmniSci and RAPIDS
## An End-to-End GPU Data Science Workflow

May 30, 2019

## 1. Connecting to OmniSciDB (Open Source)
https://github.com/omnisci/omniscidb

In [None]:
#import pymapd to connect to OmniSci
#importing pandas is for convenience with the pd.read_sql method
import pymapd
import pandas as pd
from credentials import credentials

In [None]:
# Connect to OmniSciDB, get list of tables in database
conn = pymapd.connect(host="localhost", 
                      dbname=credentials["dbname"], 
                      user=credentials["user"], 
                      password=credentials["password"])

conn.get_tables()

## 2. Simple query demonstrating data is streaming into OmniSciDB

In [None]:
import datetime
from datetime import timedelta

# Create start and end timestamps for substitution
now_ = datetime.datetime.now()
xminsago_ = datetime.datetime.now() - timedelta(minutes=5)

# Query counts number of records inserted in the last X number of minutes
query = f"""
SELECT 
date_trunc(minute, accessed_on) accessed_on,
COUNT(*) AS records
FROM free_bike_status 
WHERE accessed_on BETWEEN '{xminsago_}' AND '{now_}' 
GROUP BY 1
ORDER BY 1 DESC
"""

# OmniSci gets its speed two ways: high GPU bandwidth/core density & compiling queries with LLVM
# First run will be slower because query is compiled and/or data streamed to GPU
# Second run on will be fast because data already in GPU memory and query compiled, NOT because result is cached
%time df = pd.read_sql(query, conn)

In [None]:
df

## 3. Using Ibis for a pandas-like API over billions of records
https://www.omnisci.com/blog/scaling-pandas-to-the-billions-with-ibis-and-mapd/

In [None]:
#ibis is alternate method for querying using pandas-like API
import ibis

In [None]:
#connection string similar to pymapd
ibiscon = ibis.mapd.connect(host="localhost", 
                            database=credentials["dbname"], 
                            user=credentials["user"], 
                            password=credentials["password"],
                            port=6274)

#can list tables in similar manner as well
ibiscon.list_tables()

In [None]:
#create a table reference
#this doesn't bring the data local, Ibis is a lazy-execution engine
free_bike_status = ibiscon.table('free_bike_status')