Skip to content

v0.2.20

Choose a tag to compare

@kayhendriksen kayhendriksen released this 18 Apr 14:01
· 76 commits to main since this release

v0.2.20

Fix: to_parquet no longer OOMs on large groups (historical slices, climate_scenarios).

How: Switched the per-group concat from in-memory (read_csv + pl.concat + write_parquet) to streaming (scan_csv + concat(diagonal_relaxed) + sink_parquet). Dtype drift is handled in-place — on the polars error we force the offending column to Float64 and retry the sink.

Result on full SMN (~12 GB CSV, 10 groups): peak RSS 487 MB (was OOM at multi-GB on the Databricks driver).

No API change.