What's new in 0.3.2
A reliability and security release — no new datasets, but downloads, parsing,
and the release pipeline are all hardened.
Bug fixes
- Fixed an infinite loop in CSV parsing. The dtype-recovery fallback in
parse_csv_bytes()retried forever when a column failed to parse even as
Float64 (e.g. a stray non-numeric value past the schema-inference window).
Reachable fromfoehn.load()and the MCP server'sload_data— it now
raises a clear error instead of hanging. - Climate normals (C6) recover from interrupted runs. The skip check now
looks at the extracted TXT files rather than the ZIP, so a run that died
between download and extraction is retried instead of silently skipped
forever. - State files can no longer brick the pipeline.
_etags.jsonand
_last_run.jsonare written atomically, and a corrupt state file is treated
as empty (with a warning) instead of crashing every subsequent run.
Reliability
- All downloads are now atomic: CSVs and ZIPs join the binary assets in
streaming to a.partfile and renaming on completion (#21 + this release),
so an interrupted transfer never leaves a truncated file behind. - STAC listing and pagination use retrying HTTP sessions, and retries now also
cover 429 rate limits (honouringRetry-After). - The ETag store is pruned of stale entries on clean full runs — it no longer
grows forever as forecast assets cycle. - CSV assets with query strings (e.g.
?token=...) are detected correctly, and
time slices are parsed from the trailing filename segment so a coincidental
"now" elsewhere in a URL can't be misread (#21). - The library logs through standard
logging(foehn.*, silent when imported);
the CLI attaches its own stdout handler (#21). - CSV decoding is total: the Windows-1252 fallback replaces unmappable bytes
instead of raising.
API
-
foehn.download()gained aforce=flag to re-download ZIP-shipped datasets
(e.g.climate_scenarios_indoor) that would otherwise skip when already
extracted:import foehn foehn.download("climate_scenarios_indoor", force=True)
-
list_datasets()no longer advertises frequencies for datasets where the
frequencyfilter isn't supported (forecast_local,climate_scenarios,
climate_scenarios_indoor) — the granularity is named in the description
instead. -
All download functions return a
DownloadResultsummary (counts + new
filenames) so callers can gate downstream work (#21).
Security
- ZIP extraction now guards against decompression bombs (10 GiB declared-size
cap) on top of the existing path-traversal checks — including the in-memory
indoor-scenarios archive. - The Databricks ingest escapes backslashes as well as quotes when setting
column comments via Spark SQL. - Release pipeline hardening:
mcp-publisheris pinned by version and SHA-256,
PyPI uploads explicitly enable PEP 740 attestations, build tooling is pinned,
CodeQL runs thesecurity-extendedsuite, and all workflow checkouts use
persist-credentials: false.
Full changelog: v0.3.1...v0.3.2