## Installation 

If you have not installed snowflake-snowpark-python you can install it here. Make sure you are in the right conda environment.

If you are not sure you can try:

```bash
! which python
```

In [119]:
! which python

/opt/anaconda3/envs/dev2/bin/python


Install the library

In [117]:
# !conda install snowflake-snowpark-python -y

# THE **E**
## Snowpark Usage - Example

In [2]:
from snowflake.snowpark import Session

Example of how the `snow.cfg` file looks like

```yaml
[SNOW]
ACCOUNT=WPA36811
USER=tarek
PASSWORD=yourpassowrd
WAREHOUSE=COMPUTE_WH
DATABASE=TECHCATALYST_DE
SCHEMA=PUBLIC
ROLE=DE
```

In [3]:
import configparser

config = configparser.ConfigParser()
config.read('snow.cfg')
config.sections()

['SNOW']

Create a dictionary (key-value) of the parameters

In [4]:
params1 = dict(config['SNOW'])

Passing the parameters as a dictionary to Session

In [5]:
session1 = Session.builder.configs(params1).create()

In [6]:
print(session1.get_current_user())
print(session1.get_current_database())
print(session1.get_current_schema())

"PETER_ALONZO"
"TECHCATALYST_DE"
"PUBLIC"


Example using the `table` methods to call a specific table. In this example I am calling the `INS_ACCIDENTS` table

In [7]:
accidents = session1.table("INS_ACCIDENTS")
accidents.show(5)

-------------------------------------------------------------------------------------------------------------------------------------------------------
|"ACCIDENT_ID"  |"POLICYHOLDER_ID"  |"VEHICLE_ID"  |"ACCIDENT_TYPE"  |"ACCIDENT_DATE"  |"ESTIMATED_COST"  |"ACTUAL_REPAIR_COST"  |"AT_FAULT"  |"DUI"  |
-------------------------------------------------------------------------------------------------------------------------------------------------------
|1              |4333               |176           |3                |2022-03-30       |6694              |7480                  |True        |True   |
|2              |4547               |6391          |4                |2023-09-27       |2190              |3355                  |True        |False  |
|3              |6686               |6974          |7                |2022-06-13       |4995              |7123                  |False       |False  |
|4              |1300               |4037          |7                |2020-01-21       |

Note, `accidents` is a Snowpark object (table). There is a difference between Snowpark DataFrames and Panda DataFrames. For now, you will convert to a Pandas DataFrame to do your transformations. Once done, you will convert back to a Snowpark DataFrame and write to the Database.

In [8]:
type(accidents)

snowflake.snowpark.table.Table

In [9]:
accidents_df = accidents.to_pandas()

In [10]:
accidents_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   ACCIDENT_ID         10000 non-null  int16 
 1   POLICYHOLDER_ID     10000 non-null  int16 
 2   VEHICLE_ID          10000 non-null  int16 
 3   ACCIDENT_TYPE       10000 non-null  int8  
 4   ACCIDENT_DATE       10000 non-null  object
 5   ESTIMATED_COST      10000 non-null  int16 
 6   ACTUAL_REPAIR_COST  10000 non-null  int16 
 7   AT_FAULT            10000 non-null  bool  
 8   DUI                 10000 non-null  bool  
dtypes: bool(2), int16(5), int8(1), object(1)
memory usage: 205.2+ KB


# The **T**
## Your Work Here (Transformations)

In [11]:
accident_type = session1.table("INS_ACCIDENT_TYPE")
accidents_type_df = accident_type.to_pandas()

In [12]:
gender_marital = session1.table("INS_GENDER_MARITAL_STATUS")
gender_marital_df = gender_marital.to_pandas()

In [14]:
policy_holder = session1.table("INS_POLICYHOLDER")
policy_holder_df = policy_holder.to_pandas()

In [15]:
states = session1.table("INS_STATES")
states_df = states.to_pandas()

In [16]:
insurance = session1.table("INS_INSURANCE_COVERAGE")
insurance_df = insurance.to_pandas()

In [17]:
body_style = session1.table("INS_BODY_STYLE")
body_style_df = body_style.to_pandas()

In [18]:
vehicles = session1.table("INS_VEHICLES")
vehicles_df = vehicles.to_pandas()

In [19]:
vehicle_use = session1.table("INS_VEHICLE_USE")
vehicle_use_df = vehicle_use.to_pandas()

In [20]:
import pandas as pd
dim_accident_type_df = pd.DataFrame()
dim_accident_type_df['ACCIDENT_TYPE_ID'] = accidents_type_df['ACCIDENT_TYPE_CODE']
dim_accident_type_df['ACCIDENT_TYPE'] = accidents_type_df['ACCIDENT_TYPE']

In [21]:
dim_body_style_df = pd.DataFrame()
dim_body_style_df['BODY_STYLE_ID'] = body_style_df['BODY_STYLE_CODE']
dim_body_style_df['BODY_STYLE'] = body_style_df['BODY_STYLE']

In [22]:
dim_gender_martial_df = pd.DataFrame()
dim_gender_martial_df['GENDER_MARITALSTATUS_ID'] = gender_marital_df['GENDER_MARITAL_STATUS_CODE']
dim_gender_martial_df['GENDER_MARITAL_STATUS'] = gender_marital_df['GENDER_MARITAL_STATUS']

In [23]:
dim_policy_holder_df = pd.DataFrame()
dim_policy_holder_df['POLICYHOLDER_ID'] = policy_holder_df['POLICYHOLDER_ID']
dim_policy_holder_df['FIRST_NAME'] = policy_holder_df['FIRST_NAME']
dim_policy_holder_df['LAST_NAME'] = policy_holder_df['LAST_NAME']
dim_policy_holder_df['ADDRESS'] = policy_holder_df['ADDRESS']

In [24]:
dim_states_df = pd.DataFrame()
dim_states_df['STATE_ID'] = states_df['STATE_CODE']
dim_states_df['STATE'] = states_df['STATE']

In [25]:
dim_vehicle_use_df = pd.DataFrame()
dim_vehicle_use_df['VEHICLE_USECODE_ID'] = vehicle_use_df['USE_CODE']
dim_vehicle_use_df['VEHICLE_USE'] = vehicle_use_df['VEHICLE_ID']

In [29]:
#fact_accidents_df = pd.merge(accidents_df, dim_policy_holder_df, how='inner', on='POLICYHOLDER_ID')
fact_accidents_df = pd.DataFrame()
fact_accidents_df = pd.merge(accidents_df, policy_holder_df, how='inner', on='POLICYHOLDER_ID')
fact_accidents_df = pd.merge(fact_accidents_df, states_df, how='inner', on='STATE_CODE')
vehicles_temp = vehicles_df.drop(axis=1, columns='POLICYHOLDER_ID')
gender_temp = gender_marital_df.rename(columns={'GENDER_MARITAL_STATUS' : 'GENDER_MARITAL_STATUS_2', 'GENDER_MARITAL_STATUS_CODE' : 'GENDER_MARITAL_STATUS'})
accident_temp = accidents_type_df.rename(columns={'ACCIDENT_TYPE' : 'ACCIDENT_TYPE_2', 'ACCIDENT_TYPE_CODE' : 'ACCIDENT_TYPE'})
fact_accidents_df = pd.merge(fact_accidents_df, insurance_df, how='inner', on='POLICYHOLDER_ID')
fact_accidents_df = pd.merge(fact_accidents_df, vehicles_temp, how='inner', on='VEHICLE_ID')
fact_accidents_df = pd.merge(fact_accidents_df, vehicle_use_df, how='inner', on='VEHICLE_ID')
fact_accidents_df = pd.merge(fact_accidents_df, body_style_df, how='inner', on='BODY_STYLE_CODE')
fact_accidents_df = pd.merge(fact_accidents_df, gender_temp, how='inner', on='GENDER_MARITAL_STATUS')
fact_accidents_df = pd.merge(fact_accidents_df, accident_temp, how='inner', on='ACCIDENT_TYPE')

final_df = pd.DataFrame()
final_df['ACCIDENT_ID'] = fact_accidents_df['ACCIDENT_ID'] 
final_df['POLICYHOLDER_ID'] = fact_accidents_df['POLICYHOLDER_ID'] 
final_df['VEHICLE_ID'] = fact_accidents_df['VEHICLE_ID'] 
final_df['STATE_ID'] = fact_accidents_df['STATE_CODE']
final_df['BODY_STYLE_ID'] = fact_accidents_df['BODY_STYLE_CODE']
final_df['ACCIDENT_TYPE_ID'] = fact_accidents_df['ACCIDENT_TYPE']
final_df['GENDER_MARITALSTATUS_ID'] = fact_accidents_df['GENDER_MARITAL_STATUS']
final_df['VEHICLE_USECODE_ID'] = fact_accidents_df['USE_CODE']
final_df['ACCIDENT_DATE'] = fact_accidents_df['ACCIDENT_DATE']
final_df['VEHICLE_YEAR'] = fact_accidents_df['YEAR']
final_df['POLICYHOLDER_BIRTHDATE'] = fact_accidents_df['BIRTHDATE']
final_df['ESTIMATED_COST'] = fact_accidents_df['ESTIMATED_COST']
final_df['ACTUAL_REPAIR_COST'] = fact_accidents_df['ACTUAL_REPAIR_COST']
final_df['AT_FAULT'] = fact_accidents_df['AT_FAULT']
final_df['IS_DUI'] = fact_accidents_df['DUI']
final_df['COVERAGE_STATUS'] = fact_accidents_df['COVERAGE_STATUS']
final_df = final_df.drop_duplicates()
final_df.reset_index(inplace = True, drop = True)
final_df.head()

Unnamed: 0,ACCIDENT_ID,POLICYHOLDER_ID,VEHICLE_ID,STATE_ID,BODY_STYLE_ID,ACCIDENT_TYPE_ID,GENDER_MARITALSTATUS_ID,VEHICLE_USECODE_ID,ACCIDENT_DATE,VEHICLE_YEAR,POLICYHOLDER_BIRTHDATE,ESTIMATED_COST,ACTUAL_REPAIR_COST,AT_FAULT,IS_DUI,COVERAGE_STATUS
0,3,6686,6974,8,20,7,2,2,2022-06-13,2001,1996-07-10,4995,7123,False,False,Active
1,4,1300,4037,41,3,7,2,3,2020-01-21,1982,1991-09-20,694,850,False,False,Active
2,8,6147,4883,15,19,5,2,2,2022-12-19,2009,1986-04-18,2220,3017,True,False,Active
3,9,1367,4656,6,19,6,8,1,2022-01-11,1993,2001-08-09,8985,13817,False,False,Active
4,12,9162,6079,47,20,7,1,5,2020-11-09,1989,1968-04-21,8765,13942,False,False,Active


# THE **L**
## Write to Snowflak

To avoid mistakes, make sure you use your schema. 

In [30]:
yourschema = 'palonzo'
location = f'techcatalyst_de.{yourschema}'
print(location)

techcatalyst_de.palonzo


Convert the Pandas DataFrame into a Snowpark DataFrame

In [31]:
accidents_sdf = session1.create_dataframe(final_df)

In [32]:
print(type(final_df))
print(type(accidents_sdf))

<class 'pandas.core.frame.DataFrame'>
<class 'snowflake.snowpark.table.Table'>


When using the `write.mode()` method there are different options:

* ”`append`”: Append data of this DataFrame to the existing table. Creates a table if it does not exist.

* ”`overwrite`”: Overwrite the existing table by dropping old table.

* ”`truncate`”: Overwrite the existing table by truncating old table.

* ”`errorifexists`”: Throw an exception if the table already exists.

* ”`ignore`”: Ignore this operation if the table already exists.

* Default value is “`errorifexists`”.

In [33]:
# example using overwrite

accidents_sdf.write.mode("overwrite").save_as_table(f"{location}.fact_accidents_py")

In [34]:
at_sdf = session1.create_dataframe(dim_accident_type_df)
at_sdf.write.mode("overwrite").save_as_table(f"{location}.py_accident_type")

In [38]:
bs_sdf = session1.create_dataframe(dim_body_style_df)
bs_sdf.write.mode("overwrite").save_as_table(f"{location}.py_body_style")

In [39]:
gm_sdf = session1.create_dataframe(dim_gender_martial_df)
gm_sdf.write.mode("overwrite").save_as_table(f"{location}.py_gender_marital")

In [37]:
ph_sdf = session1.create_dataframe(dim_policy_holder_df)
ph_sdf.write.mode("overwrite").save_as_table(f"{location}.py_policyholder")

In [40]:
s_sdf = session1.create_dataframe(dim_states_df)
s_sdf.write.mode("overwrite").save_as_table(f"{location}.py_states")

In [42]:
vu_sdf = session1.create_dataframe(dim_vehicle_use_df)
vu_sdf.write.mode("overwrite").save_as_table(f"{location}.py_vehicle_use")