# Preamble

In [1]:
import snowflake.snowpark.modin.plugin
import modin.pandas as pd
import numpy as np
import datetime
import pandas as native_pd
from snowflake.snowpark.session import Session; session = Session.builder.create()

Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
Going to open: https://snowbiz.okta.com/app/snowflake/exk8wfsfryJIn4IWZ2p7/sso/saml?SAMLRequest=jVLRbuIwEPyVyPdMnKTQoxZQpeVKU1GgJRSpb25iwIpj57wOgX79OQmc2odWlfxg2TM7szs7uD7kwtkzDVzJIfJdDzlMJirlcjtEq%2Fiu00cOGCpTKpRkQ3RkgK5HA6C5KEhYmp18Zn9LBsaxhSSQ%2BmOISi2JosCBSJozICYhy%2FBxSgLXIxSAaWPl0ImSArdaO2MKgnFVVW514Sq9xYHnedi7whZVQ36hDxLF9xqFVkYlSpwpB9vTFxI%2B9rq1hEVYhcWJeMNlO4LvVN5aEJD7OF50FvNljJzw3N2tklDmTC%2BZ3vOErZ6nrQGwDpaz%2Bfp%2Bvlr%2BcUGqaiNoxhKVF6Wx1Vx7wxuWYqG23M4oGg9RkfG0m0ymgtGeH0apmE1WDy%2BB0X5J9%2BubpzWdJHKXPfaDferDU4Kcl3OiQZ1oBFCySNY5GvvkBb2Od2FP7F%2BSXkB6fffq0n9FztjmyCU1DfNstrb4xt9dlRnamKNFgf%2F7xuyQ9asNbPTxIZLdaP0aFL8xgMJ1TKjdFNIY0KOf9j%2FAH1mnZZvZ%2BUfjhRI8OTp3SufUfB2P7%2FrNC087mwZKWE65CNNUMwAbkxCqutWMGrvTRpcM4VGr%2BnmrR%2F8A&RelayState=ver%3A1-hint

# Use case 1: working with a single small table

In this example, we read the Snowhouse table `SAMPLE_DATA.TPCH_SF1.CUSTOMER`, which is 150k rows and 10.3 MB in Snowflake.

In [2]:
df = pd.read_snowflake("SAMPLE_DATA.TPCH_SF1.CUSTOMER")

Snapshot source table/view 'SAMPLE_DATA.TPCH_SF1.CUSTOMER' failed due to reason: `003029 (0A000): SQL compilation error:
Cannot clone from a table that was imported from a share.'. Data from source table/view 'SAMPLE_DATA.TPCH_SF1.CUSTOMER' is being copied into a new temporary table 'SNOWPARK_TEMP_TABLE_HM0ZEPCT5X' for snapshotting. DataFrame creation might take some time.


Just printing the data is visibly slow...

In [None]:
df

and doing certain complex transformations is very slow.

In [3]:
result = df.groupby('C_NATIONKEY').apply(lambda group: group.C_CUSTKEY.iloc[0] + group.C_CUSTKEY.mean())

KeyboardInterrupt: 

## Let's switch the backend to python!

We pay a one-time cost of a few seconds to load the data into memory.

In [4]:
df.set_backend('python', inplace=True)

Transferring data from Snowflake to Python ...:   0%|          | 0/1 [00:00<?, ?it/s]



But now printing is extremely fast...

In [5]:
df

Unnamed: 0,C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT
0,30001,Customer#000030001,"Ui1b,3Q71CiLTJn4MbVp,,YCZARIaNTelfst",4,14-526-204-4500,8848.47,MACHINERY,frays wake blithely enticingly ironic asymptote
1,30002,Customer#000030002,UVBoMtILkQu1J3v,11,21-340-653-9800,5221.81,MACHINERY,he slyly ironic pinto beans wake slyly above t...
2,30003,Customer#000030003,CuGi9fwKn8JdR,21,31-757-493-7525,3014.89,BUILDING,e furiously alongside of the requests. evenly ...
3,30004,Customer#000030004,tkR93ReOnf9zYeO,23,33-870-136-4375,3308.55,AUTOMOBILE,ssly bold deposits. final req
4,30005,Customer#000030005,pvq4uDoD8pEwpAE01aesCtbD9WU8qmlsvoFav5,9,19-144-468-5416,-278.54,MACHINERY,ructions behind the pinto beans x-ra
...,...,...,...,...,...,...,...,...
149995,29996,Customer#000029996,BnZVGZiAgcEImNm9iD,7,17-536-308-8025,4035.17,FURNITURE,"ual instructions. bold, silent foxes nag blith..."
149996,29997,Customer#000029997,lTbDYXdQ74JctD UbRbXCqF2b8,9,19-631-777-4123,2015.90,HOUSEHOLD,eodolites detect slyly alongside of the quickl...
149997,29998,Customer#000029998,ZxxiuDruzi98CcymR,23,33-619-315-9722,-810.56,FURNITURE,xpress packages. accounts sleep carefully iron...
149998,29999,Customer#000029999,CuPA4UpgTCYiXrBrpiSO D,12,22-824-951-8333,3865.14,FURNITURE,eposits-- accounts haggle across the slyly per...


And so are most things we can imagine doing with the data.

In [6]:
result = df.groupby('C_NATIONKEY').apply(lambda group: group.C_CUSTKEY.iloc[0] + group.C_CUSTKEY.mean())
result

C_NATIONKEY
0     104809.648945
1     104849.461925
2     105879.897816
3     104446.823588
4     104679.314262
5     104906.427251
6     105917.388361
7     104694.312288
8     105173.743462
9     105037.669210
10    105448.956232
11    104112.225725
12    104829.006557
13    106009.666832
14    104419.615654
15    105264.778247
16    104199.408269
17    105125.565188
18    104305.006972
19    105746.410656
20    104615.581809
21    105924.703395
22    105530.506581
23    105457.153718
24    104223.723884
dtype: float64

When we're done with our transformations, we can write the results to Snowflake.

In [7]:
snow_result = result.rename('result').set_backend('snowflake')

Transferring data from Python to Snowflake ...:   0%|          | 0/1 [00:00<?, ?it/s]

In [8]:
snow_result.to_snowflake('TEMP.MVASHISHTHA.RESULT', if_exists='replace', index=False)

# Use case 2: Filtering out most of a large table, then using python

## Setup

In this example, we read the table `SAMPLE_DATA.TPCH_SF10.LINEITEM`, which is 60M rows and 1.3 GB.

In [9]:
df = pd.read_snowflake("SAMPLE_DATA.TPCH_SF10.LINEITEM")

Snapshot source table/view 'SAMPLE_DATA.TPCH_SF10.LINEITEM' failed due to reason: `003029 (0A000): SQL compilation error:
Cannot clone from a table that was imported from a share.'. Data from source table/view 'SAMPLE_DATA.TPCH_SF10.LINEITEM' is being copied into a new temporary table 'SNOWPARK_TEMP_TABLE_6TTDK4MAYZ' for snapshotting. DataFrame creation might take some time.


In [10]:
df

Unnamed: 0,L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT
0,55599808,1941286,91325,2,21.0,27870.99,0.09,0.01,A,F,1994-01-19,1994-02-05,1994-01-31,COLLECT COD,TRUCK,sy excuses. ca
1,55599808,1189223,64257,3,15.0,19682.55,0.02,0.08,A,F,1994-03-02,1994-02-08,1994-03-09,TAKE BACK RETURN,SHIP,"egular, express dolphin"
2,55599808,1260872,35909,4,2.0,3665.62,0.08,0.06,A,F,1994-02-19,1994-03-22,1994-03-12,TAKE BACK RETURN,RAIL,final packages integrate fluffily at
3,55599808,1471392,21421,5,41.0,55896.12,0.02,0.08,A,F,1994-01-08,1994-02-18,1994-01-26,NONE,RAIL,slyly ironic deposits haggle. iron
4,55599808,1685045,10062,6,39.0,40168.44,0.06,0.00,R,F,1994-01-18,1994-03-04,1994-02-08,TAKE BACK RETURN,FOB,le quickly after the fluffily pending accou
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59986047,38886023,1285241,60278,1,31.0,38011.58,0.03,0.05,R,F,1993-02-28,1993-03-16,1993-03-19,TAKE BACK RETURN,TRUCK,ously across the quick
59986048,38886048,355803,80807,1,21.0,39034.59,0.02,0.06,N,O,1997-10-26,1997-09-08,1997-11-03,COLLECT COD,TRUCK,into beans use--
59986049,38886048,1977969,77970,2,43.0,88015.41,0.05,0.07,N,O,1997-08-02,1997-09-30,1997-08-09,NONE,REG AIR,out the even exc
59986050,38886048,1713167,38185,3,26.0,30682.08,0.00,0.05,N,O,1997-10-02,1997-09-03,1997-11-01,NONE,MAIL,is boost slyly. packages cajole regul


The data is large, and pulling it all into pandas would take over a minute...

In [11]:
df.set_backend('python')

Transferring data from Snowflake to Python ...:   0%|          | 0/1 [00:00<?, ?it/s]

KeyboardInterrupt: 

But we only need to work with a sample of the data, so we sample 2% of the data.

In [12]:
filtered = df.sample(frac=0.02)



Now it's easier to fetch the data.

In [13]:
python_filtered = filtered.set_backend('python')

Transferring data from Snowflake to Python ...:   0%|          | 0/1 [00:00<?, ?it/s]



Now we can do a complex operation on the filtered subset of the data.

In [14]:
python_filtered.groupby('L_SHIPMODE').apply(lambda group: group.L_ORDERKEY.iloc[0] + group.L_ORDERKEY.mean())

L_SHIPMODE
AIR        8.560137e+07
FOB        8.562438e+07
MAIL       8.561297e+07
RAIL       8.555800e+07
REG AIR    8.557071e+07
SHIP       8.555691e+07
TRUCK      8.558179e+07
dtype: float64

Doing the same operation on the filtered data in Snowflake would take much longer...

In [15]:
filtered.groupby('L_SHIPMODE').apply(lambda group: group.L_ORDERKEY.iloc[0] + group.L_ORDERKEY.mean())


L_SHIPMODE
AIR        8.560137e+07
FOB        8.562438e+07
MAIL       8.561297e+07
RAIL       8.555800e+07
REG AIR    8.557071e+07
SHIP       8.555691e+07
TRUCK      8.558179e+07
dtype: float64