## Snowflake Query as PySpark Dataframe and PySpark Dataframe as Snowflake Table

**NOTE:** If you are running local PySpark on Ubuntu WSL, you will need to install additional dependencies in order for your Windows host browser to be launched from WSL, which has to happen when using Snowflake's browser authentication.

sudo apt-get update<br>
sudo apt-get install xdg-util

and also install wslu: https://wslutiliti.es/wslu/install.html#ubuntu.  Then create environment variable:

`export BROWSER=wslview`

**SOURCE:** https://superuser.com/questions/1262977/open-browser-in-host-system-from-windows-subsystem-for-linuxs

In [1]:
from pathlib import Path
import configparser
import os
import pyspark
from pyspark.sql import SparkSession

In [2]:
config_file = os.getenv("CONFIG_PATH")

In [3]:
config = configparser.ConfigParser()
try:
    config.read(config_file)
except ConfigFileNotFound:
    print("config.ini file not found")

JDBC driver and Snowflake Spark Connector can be downloaded [here](https://search.maven.org/search?q=g:net.snowflake)

In [4]:
sf_jdbc_driver = config['snowflake']['jdbc_driver_path']
sf_connector = config['snowflake']['spark_driver_path']

In [5]:
sf_jdbc_driver

'/home/i33859/jdbc_drivers/snowflake/snowflake-jdbc-3.13.30.jar'

In [6]:
sf_connector

'/home/i33859/jdbc_drivers/snowflake/spark-snowflake_2.12-2.12.0-spark_3.4.jar'

In [7]:
sf_account = config['snowflake']['account']
sf_user = config['snowflake']['username']
sf_database = config['snowflake']['database']
sf_schema = config['snowflake']['schema']
sf_role = config['snowflake']['role']
sf_warehouse = config['snowflake']['warehouse']
sf_authenticator = config['snowflake']['authenticator']

In [8]:
spark = (
    SparkSession.builder.master("local[*]")
    .appName("Snowflake_JDBC")
    .config("spark.jars", f"{sf_jdbc_driver},{sf_connector}")
    .getOrCreate()
)

23/06/22 19:55:12 WARN Utils: Your hostname, VA-rveOJ44nPxI1 resolves to a loopback address: 127.0.1.1; using 192.168.56.1 instead (on interface eth1)
23/06/22 19:55:12 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
23/06/22 19:55:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


In [9]:
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"

In [10]:
# Snowflake connection parameters
sfparams = {
    "sfURL" : f"{sf_account}.snowflakecomputing.com",
    "sfUser" : sf_user,
    "sfPassword" : "your_password",  # Not applicable when using externalbrowser authenticator
    "sfDatabase" : sf_database,
    "sfSchema" : sf_schema,
    "sfRole" : sf_role,
    "sfWarehouse" : sf_warehouse,
    "sfAuthenticator" : sf_authenticator
}

In [11]:
query = "SELECT CURRENT_DATE as my_date"

In [12]:
#run custom query
df = (
    spark.read.format(SNOWFLAKE_SOURCE_NAME)
    .options(**sfparams)
    .option("query", query)
    .load()
)

Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...


In [13]:
df.show()

[Stage 0:>                                                          (0 + 1) / 1]

+----------+
|   MY_DATE|
+----------+
|2023-06-22|
+----------+



                                                                                

#### Dataframe to Snowflake

In [None]:
(df
 .select("my_date").write.format(SNOWFLAKE_SOURCE_NAME)
 .options(**sfparams)
 .option("dbtable", "my_table")
 # https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrameWriter.mode.html#pyspark.sql.DataFrameWriter.mode
 .mode("overwrite")
 .save()
)

In [None]:
spark.stop()

#### Using Context Manager (with)

In [None]:
with (SparkSession.builder.master("local[*]").appName("Snowflake_JDBC").config("spark.jars", f"{sf_jdbc_driver},{sf_connecctor}").getOrCreate()) as spark:
    query = "SELECT CURRENT_DATE as my_date"
    jdbcDF = (
        spark.read.format(SNOWFLAKE_SOURCE_NAME)
        .options(**sfparams)
        .option("query", query)
        .load()
    )
    jdbcDF.show()
    
    (jdbcDF
     .select("my_date").write.format(SNOWFLAKE_SOURCE_NAME)
     .options(**sfparams)
     .option("dbtable", "my_table")
     # https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrameWriter.mode.html#pyspark.sql.DataFrameWriter.mode
     .mode("overwrite")
     .save()
    )
    print("Completed saving dataframe as Snowflake table")