# Preamble

In [1]:
%%time

import snowflake.snowpark.modin.plugin
import modin.pandas as pd
import numpy as np
import datetime
import pandas as native_pd
from snowflake.snowpark.session import Session; session = Session.builder.create()

Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
Going to open: https://snowbiz.okta.com/app/snowflake/exk8wfsfryJIn4IWZ2p7/sso/saml?SAMLRequest=jVJdU9swEPwrHvU5luwECpo4jEugOEASsFNmeFNsOWhsS65OjmN%2BfeV8dOgDTN80p93bvdsbX%2B2q0tlyDULJAHkuQQ6XqcqE3ARoldwOLpADhsmMlUryAHUc0NVkDKwqaxo25k0%2B898NB%2BPYRhJo%2FxGgRkuqGAigklUcqElpHD4%2BUN8llAFwbawcOlIyEFbrzZiaYty2rdsOXaU32CeEYHKJLaqHfEMfJOqvNWqtjEpVeaLs7EyfSHiYjHoJi7AKyyPxh5CHFXylsj6AgN4lyXKwXMQJcsLTdNdKQlNxHXO9FSlfPT8cDIB1EM8XL3eLVXzjglRtXrKCp6qqG2O7ufaFc57hUm2E3VE0DVBdiGx97z%2BRG07m%2BjHcds3PkVJVOPXV0%2BzMvGdiRorNeRfdiyQuUuT8OiXq94lGAA2PZJ%2BjsSXinw3IcEBGCSHU8%2Bjw0j33h6%2FImdochWRmzzyZ7S2uxburCsP25lhd47%2B%2BMd8VF20Oue5mkRxFL69%2B%2FR0DKNzHhA6XQvcG9OR%2F5x%2Fjj6zjsc3t%2FqPpUpUi7ZxbpStmPo%2FHc719RWSDfA%2BlvGKiDLNMcwAbU1mq9lpzZuxNG91whCcH1X%2BvevIH&RelayState=ve

# Use case 1: working with a single small table

In this example, we read the Snowhouse table `SAMPLE_DATA.TPCH_SF1.CUSTOMER`, which is 150k rows and 10.3 MB in Snowflake.

In [2]:
%%time

df = pd.read_snowflake("SAMPLE_DATA.TPCH_SF1.CUSTOMER")

Snapshot source table/view 'SAMPLE_DATA.TPCH_SF1.CUSTOMER' failed due to reason: `003029 (0A000): SQL compilation error:
Cannot clone from a table that was imported from a share.'. Data from source table/view 'SAMPLE_DATA.TPCH_SF1.CUSTOMER' is being copied into a new temporary table 'SNOWPARK_TEMP_TABLE_I56XTZTLZY' for snapshotting. DataFrame creation might take some time.


CPU times: user 31.2 ms, sys: 15.7 ms, total: 46.9 ms
Wall time: 4.77 s


Just printing the data is visibly slow...

In [3]:
df

Unnamed: 0,C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT
0,30001,Customer#000030001,"Ui1b,3Q71CiLTJn4MbVp,,YCZARIaNTelfst",4,14-526-204-4500,8848.47,MACHINERY,frays wake blithely enticingly ironic asymptote
1,30002,Customer#000030002,UVBoMtILkQu1J3v,11,21-340-653-9800,5221.81,MACHINERY,he slyly ironic pinto beans wake slyly above t...
2,30003,Customer#000030003,CuGi9fwKn8JdR,21,31-757-493-7525,3014.89,BUILDING,e furiously alongside of the requests. evenly ...
3,30004,Customer#000030004,tkR93ReOnf9zYeO,23,33-870-136-4375,3308.55,AUTOMOBILE,ssly bold deposits. final req
4,30005,Customer#000030005,pvq4uDoD8pEwpAE01aesCtbD9WU8qmlsvoFav5,9,19-144-468-5416,-278.54,MACHINERY,ructions behind the pinto beans x-ra
...,...,...,...,...,...,...,...,...
149995,29996,Customer#000029996,BnZVGZiAgcEImNm9iD,7,17-536-308-8025,4035.17,FURNITURE,"ual instructions. bold, silent foxes nag blith..."
149996,29997,Customer#000029997,lTbDYXdQ74JctD UbRbXCqF2b8,9,19-631-777-4123,2015.90,HOUSEHOLD,eodolites detect slyly alongside of the quickl...
149997,29998,Customer#000029998,ZxxiuDruzi98CcymR,23,33-619-315-9722,-810.56,FURNITURE,xpress packages. accounts sleep carefully iron...
149998,29999,Customer#000029999,CuPA4UpgTCYiXrBrpiSO D,12,22-824-951-8333,3865.14,FURNITURE,eposits-- accounts haggle across the slyly per...


and doing certain complex transformations is very slow.

In [4]:
%%time

result = df.groupby('C_NATIONKEY').apply(lambda group: group.C_CUSTKEY.iloc[0] + group.C_CUSTKEY.mean())

CPU times: user 158 ms, sys: 33.8 ms, total: 192 ms
Wall time: 34.1 s


## Let's switch the backend to python!

We pay a one-time cost of a few seconds to load the data into memory.

In [5]:
%%time

df.set_backend('python', inplace=True)

Transferring data from Snowflake to Python ...:   0%|          | 0/1 [00:00<?, ?it/s]

CPU times: user 372 ms, sys: 67.7 ms, total: 440 ms
Wall time: 1.97 s




But now printing is extremely fast...

In [6]:
df

Unnamed: 0,C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT
0,30001,Customer#000030001,"Ui1b,3Q71CiLTJn4MbVp,,YCZARIaNTelfst",4,14-526-204-4500,8848.47,MACHINERY,frays wake blithely enticingly ironic asymptote
1,30002,Customer#000030002,UVBoMtILkQu1J3v,11,21-340-653-9800,5221.81,MACHINERY,he slyly ironic pinto beans wake slyly above t...
2,30003,Customer#000030003,CuGi9fwKn8JdR,21,31-757-493-7525,3014.89,BUILDING,e furiously alongside of the requests. evenly ...
3,30004,Customer#000030004,tkR93ReOnf9zYeO,23,33-870-136-4375,3308.55,AUTOMOBILE,ssly bold deposits. final req
4,30005,Customer#000030005,pvq4uDoD8pEwpAE01aesCtbD9WU8qmlsvoFav5,9,19-144-468-5416,-278.54,MACHINERY,ructions behind the pinto beans x-ra
...,...,...,...,...,...,...,...,...
149995,29996,Customer#000029996,BnZVGZiAgcEImNm9iD,7,17-536-308-8025,4035.17,FURNITURE,"ual instructions. bold, silent foxes nag blith..."
149996,29997,Customer#000029997,lTbDYXdQ74JctD UbRbXCqF2b8,9,19-631-777-4123,2015.90,HOUSEHOLD,eodolites detect slyly alongside of the quickl...
149997,29998,Customer#000029998,ZxxiuDruzi98CcymR,23,33-619-315-9722,-810.56,FURNITURE,xpress packages. accounts sleep carefully iron...
149998,29999,Customer#000029999,CuPA4UpgTCYiXrBrpiSO D,12,22-824-951-8333,3865.14,FURNITURE,eposits-- accounts haggle across the slyly per...


And so are most things we can imagine doing with the data.

In [7]:
%%time

result = df.groupby('C_NATIONKEY').apply(lambda group: group.C_CUSTKEY.iloc[0] + group.C_CUSTKEY.mean())
print(result)

C_NATIONKEY
0     104809.648945
1     104849.461925
2     105879.897816
3     104446.823588
4     104679.314262
5     104906.427251
6     105917.388361
7     104694.312288
8     105173.743462
9     105037.669210
10    105448.956232
11    104112.225725
12    104829.006557
13    106009.666832
14    104419.615654
15    105264.778247
16    104199.408269
17    105125.565188
18    104305.006972
19    105746.410656
20    104615.581809
21    105924.703395
22    105530.506581
23    105457.153718
24    104223.723884
dtype: float64
CPU times: user 26.3 ms, sys: 3.84 ms, total: 30.1 ms
Wall time: 29.2 ms


When we're done with our transformations, we can write the results to Snowflake.

In [8]:
%%time

snow_result = result.rename('result').set_backend('snowflake')

Transferring data from Python to Snowflake ...:   0%|          | 0/1 [00:00<?, ?it/s]

CPU times: user 8.16 ms, sys: 2.85 ms, total: 11 ms
Wall time: 9.82 ms


In [9]:
%%time

snow_result.to_snowflake('TEMP.MVASHISHTHA.RESULT', if_exists='replace', index=False)

CPU times: user 8.34 ms, sys: 4.33 ms, total: 12.7 ms
Wall time: 886 ms


# Use case 2: Filtering out most of a large table, then using python

## Setup

In this example, we read the table `SAMPLE_DATA.TPCH_SF10.LINEITEM`, which is 60M rows and 1.3 GB.

In [10]:
%%time

df = pd.read_snowflake("SAMPLE_DATA.TPCH_SF10.LINEITEM")

Snapshot source table/view 'SAMPLE_DATA.TPCH_SF10.LINEITEM' failed due to reason: `003029 (0A000): SQL compilation error:
Cannot clone from a table that was imported from a share.'. Data from source table/view 'SAMPLE_DATA.TPCH_SF10.LINEITEM' is being copied into a new temporary table 'SNOWPARK_TEMP_TABLE_6MERBO21WZ' for snapshotting. DataFrame creation might take some time.


CPU times: user 16.1 ms, sys: 5.76 ms, total: 21.8 ms
Wall time: 8.31 s


The data is large, and pulling it all into pandas would take about 2.5 minutes...

In [11]:
df.set_backend('python')

Transferring data from Snowflake to Python ...:   0%|          | 0/1 [00:00<?, ?it/s]



Unnamed: 0,L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT
0,32915745,1047474,72485,3,15.0,21321.30,0.03,0.07,A,F,1992-09-17,1992-07-08,1992-09-21,NONE,TRUCK,its wake blithel
1,32915745,882169,57194,4,36.0,41440.32,0.01,0.04,A,F,1992-08-13,1992-07-13,1992-08-14,NONE,MAIL,"about the express, final"
2,32915745,612588,12589,5,35.0,52519.25,0.06,0.05,R,F,1992-08-02,1992-08-05,1992-08-06,DELIVER IN PERSON,SHIP,e accounts. fluffily ironic packages nag
3,32915745,1761583,61584,6,35.0,57557.50,0.05,0.02,R,F,1992-09-14,1992-08-21,1992-10-07,NONE,FOB,sts wake! dependencies dete
4,32915745,919773,19774,7,26.0,46610.98,0.00,0.06,R,F,1992-06-23,1992-08-17,1992-07-05,DELIVER IN PERSON,AIR,st the furiously ironic depos
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59986047,55014720,945502,95521,1,31.0,47971.26,0.06,0.03,R,F,1993-09-08,1993-08-13,1993-09-21,NONE,MAIL,ding deposits.
59986048,55014720,1042416,67427,2,14.0,19017.04,0.01,0.08,R,F,1993-06-17,1993-08-10,1993-06-19,TAKE BACK RETURN,AIR,s! carefully ir
59986049,55014720,1877609,2628,3,21.0,33316.71,0.09,0.08,A,F,1993-08-25,1993-07-26,1993-09-06,COLLECT COD,SHIP,boost silent requests. furiously bold
59986050,55014720,971153,96163,4,26.0,31826.86,0.06,0.05,R,F,1993-08-21,1993-09-11,1993-09-07,NONE,FOB,arefully pending pearls. fluffil


But we only need to work with a sample of the data, so we sample 2% of the data.

In [12]:
%%time

filtered = df.sample(frac=0.02)



CPU times: user 43.8 ms, sys: 20.6 ms, total: 64.3 ms
Wall time: 2.89 s


Now it's easier to fetch the data.

In [13]:
%%time

python_filtered = filtered.set_backend('python')

Transferring data from Snowflake to Python ...:   0%|          | 0/1 [00:00<?, ?it/s]

CPU times: user 1.41 s, sys: 311 ms, total: 1.72 s
Wall time: 5.27 s




Now we can do a complex operation on the filtered subset of the data.

In [14]:
%%time

python_filtered.groupby('L_SHIPMODE').apply(lambda group: group.L_ORDERKEY.iloc[0] + group.L_ORDERKEY.mean())

CPU times: user 237 ms, sys: 43.3 ms, total: 280 ms
Wall time: 279 ms


L_SHIPMODE
AIR        6.288809e+07
FOB        6.291766e+07
MAIL       6.290736e+07
RAIL       6.297100e+07
REG AIR    6.289805e+07
SHIP       6.287267e+07
TRUCK      6.282901e+07
dtype: float64

Doing the same operation on the filtered data in Snowflake would take much longer.

In [15]:
%%time

filtered.groupby('L_SHIPMODE').apply(lambda group: group.L_ORDERKEY.iloc[0] + group.L_ORDERKEY.mean())

CPU times: user 201 ms, sys: 45.3 ms, total: 246 ms
Wall time: 10.9 s


L_SHIPMODE
AIR        6.288809e+07
FOB        6.291766e+07
MAIL       6.290736e+07
RAIL       6.297100e+07
REG AIR    6.289805e+07
SHIP       6.287267e+07
TRUCK      6.282901e+07
dtype: float64