Release v0.2.10 — Direct CSV ingestion via Polars streaming · kayhendriksen/foehn

Ingest pipeline rewrite

The Databricks ingest step no longer requires a Parquet intermediate layer.
Raw CSVs are now read directly from the Unity Catalog Volume using Polars
scan_csv with engine="streaming", then written to Delta tables via
Arrow → Spark.

Large historical collections (SMN, SMN Precip, SMN Tower) use chunked
writes to keep peak memory bounded (configurable via --chunk-size)
Falls back to eager parse_csv_bytes if streaming collect fails due to
mixed-type columns
Column comments are automatically applied from _meta_parameters.csv
English descriptions
Local spark-submit is now supported: Unity Catalog DDL is skipped
when DATABRICKS_RUNTIME_VERSION is not set
Databricks job tasks updated: download now runs with --no-parquet,
ingest receives --historical for the historical job

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.10 — Direct CSV ingestion via Polars streaming

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Ingest pipeline rewrite

Uh oh!