# Quick Reference: Python Functions in Snowpark

This quick reference provides an overview of essential Python functions available in Snowpark. Snowpark enables developers to process and analyze data efficiently within Snowflake using Python. 

Below, you'll find key operations categorized by functionality, along with examples demonstrating their usage.

## 1. Initializing Snowpark Session
Before using Snowpark, you need to create a session that connects to your Snowflake environment. This session serves as the gateway for executing Snowpark functions and interacting with data stored in Snowflake.

In [None]:
# Import python packages
import snowflake.snowpark.types as T
import snowflake.snowpark.functions as F

# We can also use Snowpark for our analyses!
from snowflake.snowpark.context import get_active_session
session = get_active_session()

## 2. Creating DataFrames
Snowpark DataFrames allow you to interact with data stored in Snowflake using a Pandas-like syntax while leveraging Snowflake’s performance optimizations.

### Create a dataframe with defined column types

In [None]:
schema = T.StructType([
    T.StructField("R_REGIONKEY", T.LongType(), nullable=False),
    T.StructField("R_NAME", T.StringType(25), nullable=False),
    T.StructField("R_COMMENT", T.StringType(152), nullable=False)
])
data = [
    [0, "AFRICA", ""],
    [1, "AMERICA", ""],
    [2, "ASIA", ""],
    [3, "EUROPE", ""],
    [4, "MIDDLE EAST", ""]
]
df_region = session.create_dataframe(data, schema)
df_region.limit(10).show()

### Create a dataframe from a table
Note: This template requires the Snowflake provided snowflake_sample_data database. If you don't have this database already in your account please add it by following these instructions: https://docs.snowflake.net/manuals/user-guide/sample-data-using.html

In [None]:
df_nation = session.table('SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION')
df_nation.limit(10).show()

### Create a dataframe from a SQL query

In [None]:
df_customer = session.sql("SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER")
df_customer.limit(10).show()

## 3. Basic queries for selecting and filtering
You can retrieve specific columns and apply filters using Snowpark’s DataFrame functions.

### Get a count of records in the dataframe

In [None]:
print(df_customer.count())

### Display the top 10 records in the dataframe

In [None]:
df_customer.limit(10).show()

### Select a subset of the columns in the dataframe to display

In [None]:
df_customer.select("C_NAME", "C_NATIONKEY").limit(10).show()

### Add calcuated conditional when clause column based on account balance

In [None]:
df_customer.select("C_NAME", F.when(df_customer.c_acctbal < 2000, 1).otherwise(0).alias("BAL_LT_2000")).limit(10).show()

### Add calculated column taking substring of segment name

In [None]:
df_customer.select("C_NAME", df_customer.c_mktsegment.substr(1,3).alias("SEGMENT_FIRST_THREE")).show()

### Sorting the rows in a dataframe

In [None]:
df_customer.sort(F.col("C_MKTSEGMENT").asc()).limit(10).collect()

### Basic filtering on a dataframe

In [None]:
df_customer.filter(F.col("C_ACCTBAL") < 2000).limit(10).show()

## 4. Grouping and Aggregations
Snowpark allows you to perform aggregations such as counting, summing, and averaging data.

### Count the number of rows in each group

In [None]:
df_customer.group_by(F.col("C_NATIONKEY")).count().show()

### Sum the account balance in each group

In [None]:
df_customer.group_by(F.col("C_NATIONKEY")).agg(F.sum("C_ACCTBAL").alias("ACCTBAL_SUM")).show()

## 5. Joins
You can join multiple DataFrames to combine related data from different sources.

In [None]:
# Join two tables and define the join condition
df_nation_region = df_nation.join(df_region, df_nation['N_REGIONKEY'] == df_region['R_REGIONKEY'])
df_nation_region.limit(10).show()

# Join to a third table
df_combined = df_customer.join(df_nation_region, df_customer['C_NATIONKEY'] == df_nation_region['N_NATIONKEY'])
df_combined.limit(10).show()

## 6. Understanding the Schema

Knowing the structure of a DataFrame is crucial for data manipulation:

* df_nation.columns: Lists the column names.
* df_nation.schema: Returns the full schema of the DataFrame.
* df_nation.printSchema(): Prints a readable version of the schema.
* df_nation.dtypes: Returns column names along with their data types.

In [None]:
# Queries to understand the schema of a dataframe
df_nation.columns # Return the columns of dataframe
df_nation.schema # Return the schema of dataframe
df_nation.printSchema() # Print the schema of the dataframe
df_nation.dtypes # Return dataframe column names and data types

### Query Execution Plan
To optimize queries, you can inspect the execution plan.

These functions help you explore and validate your data before performing transformations or further analysis.


In [None]:
# Functions to view the query plan
df_nation.explain() # Print the list of queries that will be executed to evaluate this dataframe

## 7. Saving DataFrames
After processing data, you can write it back to a new or existing Snowflake table.

In [None]:
# Save dataframe to a new table, overwritting if it exists
# To create a permanent table, remove the table_type parameter
# To append data to an existing table, change the mode to "append"
df_combined.write.mode("overwrite").save_as_table("saved_table", table_type="temporary")
df_saved_table = session.table("saved_table")
df_saved_table.show()


# Merge a dataframe with a table (upsert)
# Build the list of columns to update
cols_to_update = {c: df_combined[c] for c in df_combined.schema.names}
# Then do the actual merge
df_saved_table.merge(df_combined, (df_saved_table["C_CUSTKEY"] == df_combined["C_CUSTKEY"]), \
    [F.when_matched().update(cols_to_update), F.when_not_matched().insert(cols_to_update)])

# Conclusion
This cheat sheet provides a concise overview of commonly used Snowpark Python functions. By leveraging these functions, you can efficiently process data, perform transformations, and execute advanced operations within Snowflake. 


Please also check out our developer guides covering [Working with DataFrames in Snowpark Python](https://docs.snowflake.com/en/developer-guide/snowpark/python/working-with-dataframes) and [Snowpark APIs](https://docs.snowflake.com/developer-guide/snowpark/reference/python/latest/snowpark/index).