Initial Snowflake integration #49
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new Snowflake integration module for the Jaffle Shop dataset. It provides a robust, idempotent workflow for bootstrapping Snowflake infrastructure, ingesting CSV data from S3, and preparing weekly sales forecasting data for use with getML. The implementation features modular Python scripts, externalized SQL queries, and comprehensive logging and error handling. The workflow is automated via a new GitHub Actions CI pipeline.
Infrastructure and Workflow Automation
.github/workflows/snowflake-test.yml) to automate Python linting, formatting, type checking, and testing for the Snowflake integration, including coverage reporting and support for multiple Python versions.Snowflake Infrastructure Bootstrapping
bootstrap.pyto create Snowflake warehouses and databases if they do not exist, using idempotent SQL and externalized queries (create_warehouse.sql,create_database.sql). [1] [2] [3]Data Ingestion Pipeline
data/ingestion.pyto ingest Jaffle Shop CSV data from S3 into Snowflake's RAW schema using external stages and native COPY INTO commands, with transaction management and error handling. [1] [2] [3]Data Preparation for Feature Store
data/preparation.pyto create weekly sales population tables per store, calculate forecasting targets, perform schema validation, and run data quality checks, leveraging externalized SQL for maintainability. [1] [2]Modular Utilities and API
data/__init__.pyand_sql_loader.pyto provide a clean API and internal SQL file loading utilities for the integration package. [1] [2]