This repository contains example Python notebooks that demonstrate approaches for reading data from outside sources and writing it to cloud storage (Azure Storage Account and Microsoft Fabric OneLake) using Spark Notebooks. The examples are intended for learning and demonstration only and show common patterns used in data integrations.
ANY CODE PROVIDED IN THIS REPO IS FOR DEMONSTRATION USAGE ONLY.
Snowflake/Snowflake_Load_Bronze.ipynb— Reads tables from Snowflake and writes them to a Bronze layer. The notebook contains:- A
small_tablesloop that selects full tables and writes each to CSV (or Parquet) in a data lake path. - A
largeTableschunking example that demonstrates how to read large tables in ID-range chunks and save chunked CSV files. - Example code that shows how to upload files to Azure Storage (abfss path) and helper functions (recent additions) that demonstrate uploading to Microsoft Fabric OneLake via the Microsoft Graph API using app-only credentials and chunked upload sessions.
- A
Snowflake_Build_Lists.ipynb— A helper notebook that builds the lists ofsmall_tablesandlargeTablesand other control metadata used by the loader notebook. Run this first (or import/execute its variables) to populate the table lists used bySnowflake_Load_Bronze.ipynb.
- Open the notebooks in VS Code, Jupyter, or a Fabric-compatible environment.
- Install required Python packages (see
requirements.txt). - Review and set environment variables or configuration values before running the notebooks:
- Snowflake connection details (connection string/credentials) — typically provided in the code or a separate variables notebook.
- Azure Storage / OneLake configuration (storage account, container, Data Lake paths).
- If using the OneLake examples with Microsoft Graph, set the following environment variables (or replace placeholders in the notebook):
AZURE_TENANT_IDAZURE_CLIENT_IDAZURE_CLIENT_SECRETONELAKE_DRIVE_ID
- Run
Snowflake_Build_Lists.ipynbfirst to generate the table lists, then runSnowflake_Load_Bronze.ipynb.
A minimal set of Python packages used by the notebooks:
- pandas
- requests
- msal
- pyarrow (optional, for parquet support)
- pytz
A requirements.txt is included for convenience.
- The OneLake examples use application-level (client credentials) Graph access; your Azure AD app must be granted the needed Graph permissions (for example,
Files.ReadWrite.All) by an administrator. - Do not commit secrets (client secrets, passwords) to source control. Use environment variables or secure secret stores.
- The notebooks are examples and intentionally show patterns rather than production-grade code. If you want, I can:
- Replace local file writes in
Snowflake_Load_Bronze.ipynbwith direct OneLake upload calls (small files and chunked uploads) so the notebook writes exclusively to Fabric OneLake. - Add more robust retry/error handling and tests for the upload helpers.
- Replace local file writes in
If you want me to make the automatic replacements in the notebook (small_tables and/or largeTables), tell me which one(s) to change and whether to prefer CSV or Parquet.