-
Notifications
You must be signed in to change notification settings - Fork 8
Generate and Convert Jaffle Shop CSVs to Parquet Format for GCS #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,37 @@ | ||||||
| # Generate Parquet files from Jaffle Shop CSV data | ||||||
|
|
||||||
| ## Prerequisites | ||||||
|
|
||||||
| - pipx | ||||||
| - gcloud CLI | ||||||
|
||||||
| - gcloud CLI | |
| - `gcloud` CLI |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,39 @@ | ||||||
| from pathlib import Path | ||||||
|
|
||||||
| import pandas as pd | ||||||
|
|
||||||
| NAMES: list[str] = [ | ||||||
| "raw_customers", | ||||||
| "raw_items", | ||||||
| "raw_orders", | ||||||
| "raw_products", | ||||||
| "raw_stores", | ||||||
| "raw_supplies", | ||||||
| "raw_tweets", | ||||||
| ] | ||||||
|
|
||||||
| JAFFLE_CSV_DATA_PATH = Path("jaffle-data") | ||||||
|
|
||||||
| if not JAFFLE_CSV_DATA_PATH.exists(): | ||||||
| raise FileNotFoundError( | ||||||
| f"Jaffle CSV data path {JAFFLE_CSV_DATA_PATH} does not exist." | ||||||
| " Please run `jafgen` to generate CSVs." | ||||||
|
||||||
| " Please run `jafgen` to generate CSVs." | |
| " Please run `pipx run jafgen 6` to generate CSVs." |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to Path.mkdir() is incorrect. The method should be called on the path instance, not on the Path class. Change Path.mkdir(JAFFLE_PARQUET_DATA_PATH, exist_ok=True) to JAFFLE_PARQUET_DATA_PATH.mkdir(exist_ok=True) or JAFFLE_PARQUET_DATA_PATH.mkdir(parents=True, exist_ok=True) to ensure parent directories are also created if needed.
| Path.mkdir(JAFFLE_PARQUET_DATA_PATH, exist_ok=True) | |
| JAFFLE_PARQUET_DATA_PATH.mkdir(parents=True, exist_ok=True) |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,137 @@ | ||||||
| [project] | ||||||
| name = "getml-featurestore-integrations" | ||||||
| version = "0.1.0" | ||||||
| description = "Integrations and Data Preparation for getML Feature Stores" | ||||||
| authors = [ | ||||||
| { name = "Code17 GmbH", email = "hello@code17.io" }, | ||||||
| { name = "getML", email = "hello@getml.com" }, | ||||||
| ] | ||||||
| maintainers = [ | ||||||
| { name = "Code17 GmbH", email = "hello@code17.io" }, | ||||||
| { name = "getML", email = "hello@getml.com" }, | ||||||
| ] | ||||||
| license = { text = "Proprietary" } | ||||||
| classifiers = [ | ||||||
| "Programming Language :: Python :: 3", | ||||||
| "Programming Language :: Python :: 3.12", | ||||||
| "Programming Language :: Python :: 3.13", | ||||||
| "Operating System :: OS Independent", | ||||||
| "Private :: Do Not Upload", | ||||||
| "Intended Audience :: Developers", | ||||||
| "Intended Audience :: Science/Research", | ||||||
| "Topic :: Scientific/Engineering", | ||||||
| "Topic :: Scientific/Engineering :: Artificial Intelligence", | ||||||
| "Topic :: Software Development :: Libraries", | ||||||
| "Topic :: Software Development :: Libraries :: Python Modules", | ||||||
| ] | ||||||
| readme = "README.md" | ||||||
|
||||||
| readme = "README.md" |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace found at the end of this comment line. Remove the extra spaces.
| # Allow TODO comments | |
| # Allow TODO comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing backticks around
pipxfor consistency with other command-line tool references. Should be- `pipx`to match the formatting ofgcloud CLIand other tool references in the document.