Skip to content

v0.3.2

Latest

Choose a tag to compare

@kayhendriksen kayhendriksen released this 10 Jun 14:31
0e7b5d0

What's new in 0.3.2

A reliability and security release — no new datasets, but downloads, parsing,
and the release pipeline are all hardened.

Bug fixes

  • Fixed an infinite loop in CSV parsing. The dtype-recovery fallback in
    parse_csv_bytes() retried forever when a column failed to parse even as
    Float64 (e.g. a stray non-numeric value past the schema-inference window).
    Reachable from foehn.load() and the MCP server's load_data — it now
    raises a clear error instead of hanging.
  • Climate normals (C6) recover from interrupted runs. The skip check now
    looks at the extracted TXT files rather than the ZIP, so a run that died
    between download and extraction is retried instead of silently skipped
    forever.
  • State files can no longer brick the pipeline. _etags.json and
    _last_run.json are written atomically, and a corrupt state file is treated
    as empty (with a warning) instead of crashing every subsequent run.

Reliability

  • All downloads are now atomic: CSVs and ZIPs join the binary assets in
    streaming to a .part file and renaming on completion (#21 + this release),
    so an interrupted transfer never leaves a truncated file behind.
  • STAC listing and pagination use retrying HTTP sessions, and retries now also
    cover 429 rate limits (honouring Retry-After).
  • The ETag store is pruned of stale entries on clean full runs — it no longer
    grows forever as forecast assets cycle.
  • CSV assets with query strings (e.g. ?token=...) are detected correctly, and
    time slices are parsed from the trailing filename segment so a coincidental
    "now" elsewhere in a URL can't be misread (#21).
  • The library logs through standard logging (foehn.*, silent when imported);
    the CLI attaches its own stdout handler (#21).
  • CSV decoding is total: the Windows-1252 fallback replaces unmappable bytes
    instead of raising.

API

  • foehn.download() gained a force= flag to re-download ZIP-shipped datasets
    (e.g. climate_scenarios_indoor) that would otherwise skip when already
    extracted:

    import foehn
    
    foehn.download("climate_scenarios_indoor", force=True)
  • list_datasets() no longer advertises frequencies for datasets where the
    frequency filter isn't supported (forecast_local, climate_scenarios,
    climate_scenarios_indoor) — the granularity is named in the description
    instead.

  • All download functions return a DownloadResult summary (counts + new
    filenames) so callers can gate downstream work (#21).

Security

  • ZIP extraction now guards against decompression bombs (10 GiB declared-size
    cap) on top of the existing path-traversal checks — including the in-memory
    indoor-scenarios archive.
  • The Databricks ingest escapes backslashes as well as quotes when setting
    column comments via Spark SQL.
  • Release pipeline hardening: mcp-publisher is pinned by version and SHA-256,
    PyPI uploads explicitly enable PEP 740 attestations, build tooling is pinned,
    CodeQL runs the security-extended suite, and all workflow checkouts use
    persist-credentials: false.

Full changelog: v0.3.1...v0.3.2