Skip to content

0.12.0 — storage CLI Parquet output

Choose a tag to compare

@marc-chiesa marc-chiesa released this 10 Jun 13:10
51b4c7d

0.12.0 — typed Parquet output from the storage CLI

protokit storage scan now writes typed Parquet directly: --format parquet -o out.parquet converts proto records straight to columnar through the optional protokit[parquet] extra, with no JSON intermediate (#24, #26).

Highlights

  • All-or-nothing atomic publish: the file appears at -o only after a complete, fault-free scan; a pre-existing output survives any fault and is overwritten only by a complete result.
  • Misuse fails before any record is read (missing -o, --on-error skip|warn, --fields, --explicit-defaults, env-sourced PROTOKIT_FORMAT=parquet, -o colliding with an input file — all exit 2).
  • Fault reports now name the first fault's location, not just a count.
  • Parquet values are Arrow-native by design (bytes → binary, enums → int32, timestamps at microsecond resolution) — deliberately divergent from the JSON view.

BREAKING (pre-1.0 policy): IncompleteScanError(fault_count: int)IncompleteScanError(faults: tuple[FrameError, ...]). Consumers constructing the exception directly must update; read-only consumers are unaffected (fault_count is preserved as len(faults)).

Full details in CHANGELOG.md.