# Snowpark for Python Available for Public Release (June 14, 2022)

#### Pre-Requisites

- Python 3.8.x
- Already existing Snowflake account/environment
- pip install snowflake-snowpark-python or pip install "snowflake-snowpark-python[pandas]"

## Reference Material

- [Developer Guide](https://docs.snowflake.com/en/developer-guide/snowpark/python/index.html)
- [Snowpark Python API](https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/_autosummary/snowflake.snowpark.html)
- [Quick Start Guide](https://quickstarts.snowflake.com/guide/getting_started_with_snowpark_python/index.html?index=..%2F..index#0) - NOTE: You can pip install instead of using .whl from their github repo
- [Article](https://medium.com/snowflake/migrating-from-pyspark-to-snowpark-python-series-part-1-a75058c1e579) on migrating to Snowpark from PySpark
- Working with dataframes from official [docs](https://docs.snowflake.com/en/developer-guide/snowpark/python/working-with-dataframes.html#)

<a id="top">

## Table of Contents

- [Create session and create a basic dataframe](#session)
- [Convert Snowpark dataframe to pandas dataframe](#to_pandas)
- [Create dataframe from SQL query](#sql)
- [List contents of a staging directory](#list)
- [Read csv file from staging](#read_csv)

In [1]:
from pathlib import Path
from snowflake.snowpark import Session
import configparser

#### Obtain Snowflake credentials from a config file so that we don't accidentally expose them to others

In [2]:
config = configparser.ConfigParser()
config.read(Path.home() / '.config' / 'config.ini')
SF_USERNAME = config['snowflake']['username']
# SF_PASSWORD = config['snowflake']['password']  # Not needed when using browser authenticator
SF_ROLE = config['snowflake']['role']
SF_WAREHOUSE = config['snowflake']['warehouse']
SF_DATABASE = config['snowflake']['database']
SF_SCHEMA = config['snowflake']['schema']
SF_ACCOUNT = config['snowflake']['account']
SF_AUTHENTICATOR = config['snowflake']['authenticator']  # Using browser authenticator, not Okta

#### Pass Credentials to a Python Dictionary

In [3]:
connection_parameters = {
  "account": SF_ACCOUNT,
  "user": SF_USERNAME,
#  "password": SF_PASSWORD,
  "role": SF_ROLE,
  "warehouse": SF_WAREHOUSE,
  "database": SF_DATABASE,
  "schema": SF_SCHEMA,
  "authenticator": SF_AUTHENTICATOR
}

<a id="session">

#### Create Session and Start Doing PySpark-y Things

[[back to top](#top)]

In [4]:
session = Session.builder.configs(connection_parameters).create()
df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
df = df.filter(df.a > 1)
df.show()

Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
-------------
|"A"  |"B"  |
-------------
|3    |4    |
-------------



<a id="to_pandas">

#### Convert Snowpark dataframe to Pandas dataframe

[[back to top](#top)]

In [5]:
pandas_df = df.to_pandas()  # this requires pandas installed in the Python environment

In [6]:
pandas_df

Unnamed: 0,A,B
0,3,4


#### Don't forget to close the session when you're done with it.

In [7]:
session.close()

<a id="sql">

#### Create dataframe from a SQL query

[[back to top](#top)]

In [8]:
connection_parameters = {
  "account": SF_ACCOUNT,
  "user": SF_USERNAME,
#  "password": SF_PASSWORD,
  "role": SF_ROLE,
  "warehouse": SF_WAREHOUSE,
  "database": SF_DATABASE,
  "schema": "NHTSA",
  "authenticator": SF_AUTHENTICATOR
}

session = Session.builder.configs(connection_parameters).create()
sql = """
select
    modelyear
    , count(*) as qty
from nhtsa.vw_nhtsa_wide_pl
group by
    modelyear
order by
    modelyear
"""
df = session.sql(sql)

Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...


`show()` will display first 10 rows of data

In [9]:
df.show()

-----------------------
|"MODELYEAR"  |"QTY"  |
-----------------------
|2000         |585    |
|2001         |611    |
|2002         |633    |
|2003         |640    |
|2004         |636    |
|2005         |645    |
|2006         |659    |
|2007         |655    |
|2008         |661    |
|2009         |574    |
-----------------------



In [10]:
pandas_df = df.to_pandas()

In [11]:
pandas_df

Unnamed: 0,MODELYEAR,QTY
0,2000,585
1,2001,611
2,2002,633
3,2003,640
4,2004,636
5,2005,645
6,2006,659
7,2007,655
8,2008,661
9,2009,574


In [12]:
session.close()

#### A more Pythonic way is to use context manager (aka with clause) which will close the session for you.

In [13]:
with Session.builder.configs(connection_parameters).create() as session:
    df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
    df = df.filter(df.a > 1)
    pandas_df = df.to_pandas()
    df.show()

Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
-------------
|"A"  |"B"  |
-------------
|3    |4    |
-------------



#### Snowpark dataframe converted to pandas dataframe

In [14]:
pandas_df

Unnamed: 0,A,B
0,3,4


In [15]:
session.close()

<a id="list">

#### List contents of "user" or personal staging directory

In [16]:
session = Session.builder.configs(connection_parameters).create()

Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...


In [17]:
session.sql("list @~/test/").show(max_width=100)

-----------------------------------------------------------------------------------------------
|"name"            |"size"  |"md5"                             |"last_modified"               |
-----------------------------------------------------------------------------------------------
|test/cars.csv.gz  |6784    |9bc71daabf25cb048925677469f0a59e  |Sat, 5 Feb 2022 19:42:06 GMT  |
-----------------------------------------------------------------------------------------------



<a id="read_csv">

#### Read a CSV file from Staging - Currently, Snowpark does not automatically infer schema with CSV or JSON format, you will have to manually define the schema - [reference](https://docs.snowflake.com/en/user-guide/data-load-overview.html#detection-of-column-definitions-in-staged-semi-structured-data-files)

Currently, can not load or read a local CSV file.  To read a local csv file, have to first upload it to staging using "put" function per [docs](https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.FileOperation.put.html)

In [18]:
from snowflake.snowpark.types import FloatType, IntegerType, StringType, StructField, StructType

In [19]:
cars_schema = StructType(
    [
        StructField("Car", StringType()),
        StructField("MPG", FloatType()),
        StructField("Cylinders", IntegerType()),
        StructField("Displacement", IntegerType()),
        StructField("Horsepower", IntegerType()),
        StructField("Weight", IntegerType()),
        StructField("Acceleration", FloatType()),
        StructField("Model", IntegerType()),
        StructField("Origin", StringType())
    ]
)

In [20]:
df_cars = (
    session.read
    .option("skip_header", 1)
    .option("field_delimiter", ";")
    .option("FIELD_OPTIONALLY_ENCLOSED_BY", '"')
    .schema(cars_schema)
    .csv("@~/test/cars.csv")
)

In [21]:
df_cars.show()

------------------------------------------------------------------------------------------------------------------------------------
|"CAR"                      |"MPG"  |"CYLINDERS"  |"DISPLACEMENT"  |"HORSEPOWER"  |"WEIGHT"  |"ACCELERATION"  |"MODEL"  |"ORIGIN"  |
------------------------------------------------------------------------------------------------------------------------------------
|Chevrolet Chevelle Malibu  |18.0   |8            |307             |130           |3504      |12.0            |70       |US        |
|Buick Skylark 320          |15.0   |8            |350             |165           |3693      |11.5            |70       |US        |
|Plymouth Satellite         |18.0   |8            |318             |150           |3436      |11.0            |70       |US        |
|AMC Rebel SST              |16.0   |8            |304             |150           |3433      |12.0            |70       |US        |
|Ford Torino                |17.0   |8            |302             |1

In [22]:
df_cars_pandas = df_cars.to_pandas()

In [23]:
df_cars_pandas

Unnamed: 0,CAR,MPG,CYLINDERS,DISPLACEMENT,HORSEPOWER,WEIGHT,ACCELERATION,MODEL,ORIGIN
0,Chevrolet Chevelle Malibu,18.0,8,307,130,3504,12.0,70,US
1,Buick Skylark 320,15.0,8,350,165,3693,11.5,70,US
2,Plymouth Satellite,18.0,8,318,150,3436,11.0,70,US
3,AMC Rebel SST,16.0,8,304,150,3433,12.0,70,US
4,Ford Torino,17.0,8,302,140,3449,10.5,70,US
...,...,...,...,...,...,...,...,...,...
401,Ford Mustang GL,27.0,4,140,86,2790,15.6,82,US
402,Volkswagen Pickup,44.0,4,97,52,2130,24.6,82,Europe
403,Dodge Rampage,32.0,4,135,84,2295,11.6,82,US
404,Ford Ranger,28.0,4,120,79,2625,18.6,82,US


In [24]:
session.close()