# Saving the Final Dataset

## Loading the transformed and enriched dataset

In [1]:
load("02_Transforming_and_Enriching_Data.RData")

## Save as a CSV file

In [2]:
write.csv(df, "../../data/output/customer_churn_abt.csv", row.names = FALSE)

## Export to Google Cloud Storage (if applicable)
If we had GCS and wanted to save a table in it as a Parquet data set, we would use the following code.

In [3]:
library(arrow)
library(googleCloudStorageR)

# --- CONFIGURATION ---

# Your Google service account JSON key
gcs_key <- "../../keys/gel-sas1-writer 1.json"


Attaching package: ‘arrow’


The following object is masked from ‘package:utils’:

    timestamp




In [4]:
# Your GCS bucket and destination path
bucket_name <- "sas1-learn"
output_filename <- "customer_churn_abt.parquet"
gcs_object_path <- paste0("data/", output_filename)

In [5]:
# Authenticate with GCS
Sys.setenv(GOOGLE_APPLICATION_CREDENTIALS = gcs_key)
gcs_auth(json_file = gcs_key)

ERROR: Error in init_oauth_service_account(self$secrets, scope = self$params$scope, : Bad Request (HTTP 400).


In [None]:
# --- EXPORT ---

# Save DataFrame as local Parquet file
local_path <- tempfile(fileext = ".parquet")
write_parquet(df, local_path)

# Upload to GCS
gcs_upload(file = local_path,
           bucket = bucket_name,
           name = gcs_object_path,
           predefinedAcl = 'bucketLevel')
 
cat(sprintf("DataFrame written to %s\n", gcs_object_path))

## Export to Snowflake (if applicable)

If we had Snowflake and wanted to save a table in it, we would do the following.

Begin by loading the **reticulate** package whick allows users to run Python code, import Python modules, and pass data between R and Python directly within the same R notebook environment.

In [None]:
library(reticulate)

Before we can connect to Snowflake, we need to install the pandas-compatible version of the Snowflake Connector for Python onto the python version used by the reticulate package. First, copy the python path from the first line of output:

In [None]:
py_config()

Open a terminal window and run the following line:

<python path from py_config()> -m pip install "snowflake-connector-python[pandas]"

Export the data

In [None]:
# Import Python modules

json <- import("json")
sf_connector <- import("snowflake.connector")
sf_tools <- import("snowflake.connector.pandas_tools")

# Load Snowflake credentials from JSON
sf_credential <- "../../keys/snowflake_cred.json"
open <- import_builtins()$open
f <- open(sf_credential, "r")
sf_credentials_dict <- json$load(f)
f$close()

# Connect to Snowflake
conn <- do.call(sf_connector$connect, sf_credentials_dict)

# Convert R dataframe to Pandas dataframe
pandas <- import("pandas")
df_pandas <- r_to_py(df)

# Export to Snowflake
result <- sf_tools$write_pandas(
  conn = conn,
  df = df_pandas,
  table_name = "customer_churn_abt",
  auto_create_table = TRUE,
  overwrite = TRUE
)

# Close connection
conn$close()
