# Credit Card Fraud Prediction - Loading Dataset using Snowpark Python

This demo is based on the Machine Learning for Credit Card Fraud detection - Practical handbook, https://fraud-detection-handbook.github.io/fraud-detection-handbook/

## Loading Credit Card Transactions into Snowflake

### Import the dependencies and connect to Snowflake

In [None]:
from snowflake.snowpark import Session
from snowflake.snowpark.types import *
from snowflake.snowpark.functions import *

import json

In [None]:
with open('creds.json') as f:
    connection_parameters = json.load(f)    

In [None]:
session = Session.builder.configs(connection_parameters).create()

### Define Staging Area and the Schema for the transaction table

Using SQL we can create a internal stage and then use the **put** function to uplad the **fraud_transactions.csv.gz** file to it.

In [None]:
# Create a internal staging area for uploading the source file
session.sql("CREATE or replace STAGE fraud_data").collect()

# Upload the source file to the stage
putResult = session.file.put("data/fraud_transactions.csv.gz", "@fraud_data", auto_compress=False)

putResult

Define the schma for our **CUSTOMER_TRANSACTIONS_FRAUD** table

In [None]:
# Define the schema for the Frauds table
dfCustTrxFraudSchema = StructType(
    [
        StructField("TRANSACTION_ID", IntegerType()),
        StructField("TX_DATETIME", TimestampType()),
        StructField("CUSTOMER_ID", IntegerType()),
        StructField("TERMINAL_ID", IntegerType()),
        StructField("TX_AMOUNT", FloatType()),
        StructField("TX_TIME_SECONDS", IntegerType()),
        StructField("TX_TIME_DAYS", IntegerType()),
        StructField("TX_FRAUD", IntegerType()),
        StructField("TX_FRAUD_SCENARIO", IntegerType())
    ]
)

Load the **fraud_transactions.csv.gz** to a DataFrame reader and save into a table

In [None]:
# Crete a reader
dfReader = session.read.schema(dfCustTrxFraudSchema)

# Get the data into the data frame
dfCustTrxFraudRd = dfReader.csv("@fraud_data/fraud_transactions.csv.gz")

In [None]:
# Write the dataframe in a table
ret = dfCustTrxFraudRd.write.mode("overwrite").saveAsTable("CUSTOMER_TRANSACTIONS_FRAUD")

### Read the data from the staging area and create CUSTOMER_TRANSACTIONS_FRAUD, CUSTOMERS and TERMINALS tables

In [None]:
# Now create Customers and Terminal tables

dfCustTrxFraudTb =session.table("CUSTOMER_TRANSACTIONS_FRAUD")

dfCustomers = dfCustTrxFraudTb.select(col("CUSTOMER_ID")).distinct().sort(col("CUSTOMER_ID"))

dfTerminals = dfCustTrxFraudTb.select(col("TERMINAL_ID")).distinct().sort(col("TERMINAL_ID"))
                                
ret2 = dfCustomers.write.mode("overwrite").saveAsTable("CUSTOMERS")

ret3 = dfTerminals.write.mode("overwrite").saveAsTable("TERMINALS")