Ingestion of Jaffle Shop Data to Snowflake #58
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new, modular data integration layer for loading and preparing the Jaffle Shop dataset in Snowflake for getML Feature Store integration. It provides robust infrastructure bootstrapping, typed configuration, session management, and SQL utilities, all with clear documentation and usage examples. The codebase is organized for maintainability and extensibility, and includes scripts, configuration, and test scaffolding.
The most important changes are:
Infrastructure Bootstrapping and Session Management
ensure_infrastructureandBootstrapErrorindata/_bootstrap.pyto automatically create Snowflake warehouses and databases if missing, with idempotent operations and clear error handling.create_sessionindata/_snowflake_session.pyfor robust, context-managed Snowflake Snowpark sessions, including error handling for failed connections.Configuration and Environment Management
SnowflakeSettingsindata/_settings.pyfor typed configuration loaded fromSNOWFLAKE_*environment variables, making authentication and connection setup consistent and secure.mise.tomlfile with templated environment variable setup for seamless local development and CI configuration.Data Ingestion and SQL Utilities
data/_sql_loader.pyutility for loading and formatting SQL files, and added a suite of parameterized SQL templates for schema, stage, and table creation, as well as data ingestion from Parquet files and cloud storage. [1] [2] [3] [4] [5] [6] [7] [8] [9]ingest_jaffle_shop_data.pyscript to orchestrate end-to-end data ingestion from a public GCS bucket into Snowflake, with automatic infrastructure setup.Project Structure and Documentation
__init__.pyfiles, public API exports, and comprehensive docstrings for all modules. [1] [2]pyproject.tomlwith dependencies, development tools, and code style/linting configuration for consistent development and testing.