## omega-ml - snowflake demo

This demo demonstrates using omegaml with snowflake data sources directly in omega-ml

* store & retrieve connections to snowflake for dynamic queries using SQL (dynamic: at runtime)
* store & retrieve views to snowflake (storing connection & static SQL)
* copy data from snowflake to omega-ml for further processing

Installation

1. install dependencies: `pip install --user -U snowflake-sqlalchemy`
2. register snow flake: 
  
      from sqlalchemy.dialects import registry
      registry.register('snowflake', 'snowflake.sqlalchemy', 'dialect')

Usage

`om.datasets.put('snowflake://user:password@account', 'omega-dataset-name', sql='select ...', copy=True)`

details see `help(omx_snowflake)`

Version history

- 0.1.0 - initial version (without support for copying data)
- 0.1.1 - support copying of data
- 0.1.2 - provide more robustness in parallel inserts on copy 
- 0.1.3 - simplify using the omegaml sqlalchemy plugin

In [5]:
# install dependencies
!pip install --user -U snowflake-sqlalchemy
from sqlalchemy.dialects import registry
registry.register('snowflake', 'snowflake.sqlalchemy', 'dialect')

In [4]:
import omegaml as om

secrets = om.datasets.get('secrets')[0]

In [5]:
# build connection string
from getpass import getpass
#user = input('snowflake user name> ')
#password = getpass('snowflake password> ')
#account = input('snowflake account (remove .snowflake.com)> ')
snowflake_cxstr = 'snowflake://{user}:{password}@{account}/'.format(**secrets)

In [6]:
# store just the connection
om.datasets.drop('mysnowflake', force=True)
om.datasets.put(snowflake_cxstr, 'mysnowflake')
om.datasets.get('mysnowflake', raw=True)

<sqlalchemy.engine.base.Connection at 0x7fc406afab38>

In [7]:
# store a connection reference with sql 
om.datasets.drop('mysnowflake', force=True)
om.datasets.put(snowflake_cxstr, 'mysnowflake', 
                sql='select count(*) from snowflake_sample_data.tpch_sf1.lineitem')
om.datasets.get('mysnowflake')

Unnamed: 0,COUNT(*)
0,6001215


In [8]:
# query the connection with a specific sql, returning a pandas dataframe
om.datasets.drop('mysnowflake', force=True)
om.datasets.put(snowflake_cxstr, 'mysnowflake')
om.datasets.get('mysnowflake', 
                sql='select count(*) from snowflake_sample_data.tpch_sf1.lineitem')

Unnamed: 0,COUNT(*)
0,6001215


In [10]:
# copy the dataset to a native omegaml dataset
om.datasets.drop('mysnowflake', force=True)
om.datasets.put(snowflake_cxstr, 
                'mysnowflake', 
                sql='select count(*) from snowflake_sample_data.tpch_sf1.lineitem',
                copy=True)
om.datasets.get('mysnowflake')

1rows [00:00,  5.00rows/s]


Unnamed: 0,COUNT(*)
0,6001215


In [11]:
# copy the dataset to a native omegaml dataset
om.datasets.drop('mysnowflake', force=True)
om.datasets.put(snowflake_cxstr, 
                'mysnowflake', 
                sql='select * from snowflake_sample_data.tpch_sf1.lineitem limit 1000',
                parse_dates=['l_shipdate', 'l_receiptdate', 'l_commitdate'],
                chunksize=100,
                append=False,
                copy=True)
len(om.datasets.getl('mysnowflake'))

100000rows [00:08, 12883.57rows/s]


100000