v0.2.10 — Direct CSV ingestion via Polars streaming
Ingest pipeline rewrite
The Databricks ingest step no longer requires a Parquet intermediate layer.
Raw CSVs are now read directly from the Unity Catalog Volume using Polars
scan_csv with engine="streaming", then written to Delta tables via
Arrow → Spark.
- Large historical collections (SMN, SMN Precip, SMN Tower) use chunked
writes to keep peak memory bounded (configurable via--chunk-size) - Falls back to eager
parse_csv_bytesif streaming collect fails due to
mixed-type columns - Column comments are automatically applied from
_meta_parameters.csv
English descriptions - Local
spark-submitis now supported: Unity Catalog DDL is skipped
whenDATABRICKS_RUNTIME_VERSIONis not set - Databricks job tasks updated:
downloadnow runs with--no-parquet,
ingestreceives--historicalfor the historical job