# 06 Load Excel Files

* Author: Jeremiah Hansen
* Last Updated: 10/25/2024

load data into `LOCATION` and `ORDER_DETAIL` tables from Excel files.

does not use Snowpark File Access as it doesn't yet work in Notebooks. So for now we copy the file locally first.

In [None]:
-- This won't be needed when we can pass variables to Notebooks!
SELECT current_database() AS DATABASE_NAME, current_schema() AS SCHEMA_NAME

In [None]:
# Import python packages
import logging
import pandas as pd

logger = logging.getLogger("demo_logger")

# Get target database and schema using results from SQL cell above .  won't be needed when we can pass variables to Notebooks!

current_context_df = cells.sql_get_context.to_pandas()
database_name      = current_context_df.iloc[0,0]     # DEMO_DB
schema_name        = current_context_df.iloc[0,1]     # DEV_SCHEMA


from snowflake.snowpark.context import get_active_session  # can also use Snowpark for analyses!
session = get_active_session()
#session.use_schema(f"{database_name}.{schema_name}")

logger.info("06_load_excel_files start")

In [None]:
-- Temp solution to load in metadata, should be replaced with directy query to a directory table (or a metadata table)
       SELECT '@INTEGRATIONS.FROSTBYTE_RAW_STAGE/intro/order_detail.xlsx' AS STAGE_FILE_PATH, 'order_detail' AS WORKSHEET_NAME, 'ORDER_DETAIL' AS TARGET_TABLE
UNION  SELECT '@INTEGRATIONS.FROSTBYTE_RAW_STAGE/intro/location.xlsx'                       , 'location'                      , 'LOCATION';

## Create a function to load Excel worksheet to table

Create a reusable function to load an Excel worksheet to a table in Snowflake.

Note: Until we can use the `SnowflakeFile` class in Notebooks, we need to temporarily copy the file to a local temp folder and then process from there.

In [None]:
import os
from openpyxl import load_workbook

def load_excel_worksheet_to_table_local(session, 
                                        stage_file_path, 
                                        worksheet_name, 
                                        target_table):
  local_directory = "./"
  file_name       = os.path.basename(stage_file_path)
  get_status      = session.file.get(stage_file_path, local_directory)    # copy file from stage to local

  with open(f"{local_directory}{file_name}", 'rb') as f:
      
    workbook = load_workbook(f)
    sheet    = workbook[worksheet_name]
    data     = sheet.values
    columns  = next(data)[0:]                       # Get first line in file as header
    df       = pd.DataFrame(data, columns=columns)  # Create DataFrame based on second and subsequent lines
    df2      = session.create_dataframe(df)
      
    df2.write.mode("overwrite").save_as_table(target_table)
 
  return True

## Process all Excel worksheets

Loop through each Excel worksheet to process and call our `load_excel_worksheet_to_table_local()` function.

In [None]:
# Process each file from sql_get_spreadsheets cell above

files_to_load = cells.sql_get_spreadsheets.to_pandas()

for index, excel_file in files_to_load.iterrows():
    
    logger.info(f"Processing Excel file {excel_file['STAGE_FILE_PATH']}")
    
    load_excel_worksheet_to_table_local(
                                        session, 
                                        excel_file['STAGE_FILE_PATH'], 
                                        excel_file['WORKSHEET_NAME' ], 
                                        excel_file['TARGET_TABLE'   ]
                                       )
    

logger.info("06_load_excel_files end")

### Debugging

In [None]:
--DESCRIBE TABLE LOCATION;
--SELECT * FROM LOCATION;
SHOW TABLES;