Skip to content

Releases: marc-chiesa/protokit

0.12.0 — storage CLI Parquet output

10 Jun 13:10
51b4c7d

Choose a tag to compare

0.12.0 — typed Parquet output from the storage CLI

protokit storage scan now writes typed Parquet directly: --format parquet -o out.parquet converts proto records straight to columnar through the optional protokit[parquet] extra, with no JSON intermediate (#24, #26).

Highlights

  • All-or-nothing atomic publish: the file appears at -o only after a complete, fault-free scan; a pre-existing output survives any fault and is overwritten only by a complete result.
  • Misuse fails before any record is read (missing -o, --on-error skip|warn, --fields, --explicit-defaults, env-sourced PROTOKIT_FORMAT=parquet, -o colliding with an input file — all exit 2).
  • Fault reports now name the first fault's location, not just a count.
  • Parquet values are Arrow-native by design (bytes → binary, enums → int32, timestamps at microsecond resolution) — deliberately divergent from the JSON view.

BREAKING (pre-1.0 policy): IncompleteScanError(fault_count: int)IncompleteScanError(faults: tuple[FrameError, ...]). Consumers constructing the exception directly must update; read-only consumers are unaffected (fault_count is preserved as len(faults)).

Full details in CHANGELOG.md.

v0.11.0

09 Jun 22:16
c76b34d

Choose a tag to compare

Expressive proto test-assertion layer over the protokit.message differ.

Added

  • Expressive message matchers + comparison parity (protokit.message). A framework-agnostic test-assertion layer over the differ: proto_match(actual, expected, *, partial=, as_set=, ignore=, presence=, approx=) (single-call) and expect_proto(expected).partially().as_set("items").ignoring(pred).approximately(...).matches(actual) (fluent) raise AssertionError with the existing per-field diff on mismatch. The same policy is exposed as a hamcrest.BaseMatcher via equals_proto(...) behind a new optional protokit[hamcrest] extra (used as assert_that(actual, equals_proto(...))), and as a proto_matcher pytest fixture; the bare assert msg1 == msg2 rich-diff rendering gains presentation-only config. Backing the matcher, the MessageDifferencer engine gains five opt-in, additive capabilities, all targeted by one unified field selector (a dotted path or a (FieldDescriptor, path) predicate): partial / sub-shape scope (set_partial()); keyless set comparison for repeated fields (treat_as_set(selector)); predicate-based field ignore (ignore_fields(...) now also accepts a predicate); EQUAL vs EQUIVALENT presence (set_message_field_comparison(...)); and selective per-field float tolerance (set_float_comparison(..., selector=)). New public surface: proto_match, expect_proto, MatchPolicy, Approx, MatcherError, equals_proto, HamcrestExtraNotInstalledError, MessageFieldComparison. The CLI matcher surface is a separate later effort.

Behavior note: the default EQUIVALENT presence mode now treats a presence-bearing field set to its default value as equal to an unset field (previously a presence difference); a non-default value vs unset is unchanged. Opt into MessageFieldComparison.EQUAL for strict set-vs-unset presence.

Install: pip install protokit==0.11.0 · with the PyHamcrest adapter: pip install 'protokit[hamcrest]==0.11.0'

v0.10.0

08 Jun 02:22
626a5e9

Choose a tag to compare

0.10.0

Two additive protokit storage deliveries:

  • Field selection + dense JSON (--fields / --explicit-defaults)
    protokit storage scan/head gain --fields a,b.c (presence-faithful
    nested view, snake_case) and --explicit-defaults (dense full-record JSON,
    camelCase). New public protokit.storage.project() + FieldSelectionError.
  • Columnar / Parquet output — optional protokit[parquet] extra
    (Rust-backed ptars + pyarrow) adds a library-first proto→Arrow→Parquet path
    (to_arrow_batches / to_parquet), skipping the proto→JSON→Parquet
    double-encode. New typed exceptions: ParquetExtraNotInstalledError,
    SchemaMismatchError, UnknownStreamError, HandlerBuildError,
    IncompleteScanError.

See CHANGELOG.md for full details. pip install protokit==0.10.0

v0.9.0 — data-at-rest

31 May 02:05
037c2ac

Choose a tag to compare

protokit 0.9.0 — data-at-rest

BREAKING — the message differ's Difference.old_value/new_value are renamed to left_value/right_value (the two compared messages aren't a before/after pair). Read-only deprecation aliases are retained and removed at 1.0; constructing with the old kwargs raises. Pre-1.0, pin protokit~=0.9.0.

Added — data-at-rest

  • protokit.storage — schema-aware scan/filter engine for stored protobuf. scan(source, registry, *, predicate, on_error) routes each (stream_id, record_bytes) record to its stream's isolated descriptor pool and yields a tagged ScanRecord. Source accepts any iterable of (stream_id, bytes | memoryview); fail-loud by default with opt-in skip/collect/route tolerant modes.
  • protokit storage CLIscan / head / count over length-delimited files: the minimal --where grammar (path == scalar / != / has:path), --desc/--proto (+--proto-path) schema sources with --type, --format human|json, and --on-error raise|skip|warn. Exit 0/2 (+ count --quiet grep-like 1).
  • ProtoFileSchema (.proto→compile schema source, never SystemExit), on_error='route' + an error_sink callback on scan(), and the typed SchemaCompileError / WhereError exceptions.

Full details in CHANGELOG.md ## 0.9.0.

v0.8.0 — D7: protokit compat --rule-pack rename

29 May 00:11
c4613fc

Choose a tag to compare

protokit compat's --rule-pack flag collided in name with the
protokit lint --rule-pack flag added in D3 — they shared a name
across subcommands but loaded different rule systems (compat's are
FieldPlugin-shaped via SchemaChecker.load_rule_pack; lint's are
LintRuleSpec-shaped). D7 renames compat's flag to
--compat-rule-pack and keeps the old name as a deprecation alias.
Both flags work today; the legacy name is removed in protokit 1.0.
No behavior change for users who migrate to the new name; no break
for users who don't.

Added

  • --compat-rule-pack MODULE on every sub-subcommand within
    protokit compat (check, history, bisect, ci). Same
    semantics as the legacy --rule-pack (Python module exposing a
    RULES = [(rule_id, plugin_fn), ...] list; repeatable; loaded via
    SchemaChecker.load_rule_pack). The new name is the canonical
    name going forward and resolves the cross-CLI naming collision
    with protokit lint --rule-pack (which retains the unqualified
    name).

Deprecated

  • --rule-pack on protokit compat {check,history,bisect,ci} is now
    a deprecation alias for --compat-rule-pack. The flag still loads
    packs identically and remains accepted in 0.8.x, but each
    invocation emits a UserWarning to stderr:
    "--rule-pack is deprecated and will be removed in protokit 1.0; use --compat-rule-pack instead."
    The flag is hidden=True in --help output to nudge new code
    toward the canonical name.
    • UserWarning (not DeprecationWarning) is the deliberate class
      choice. DeprecationWarning is hidden from CLI users by
      Python's default warning filter, and it gets promoted to an
      exception under -W error::DeprecationWarning strict-warning CI
      (which Click traps in its arg-parse pipeline). UserWarning is
      visible by default and matches the in-repo precedent at
      src/protokit/formatters/_registry.py.
    • The warning fires exactly once per invocation regardless of how
      many times --rule-pack is repeated on the command line (Click
      invokes per-option callbacks once per option-collection cycle).
      Mixing the old and new flag names in a single invocation
      accumulates both sets of packs (per Click multiple=True
      semantics, not last-wins) and emits exactly one warning.

Migration note

Old:

protokit compat check OLD NEW --type acme.User --rule-pack myorg.proto_rules

New:

protokit compat check OLD NEW --type acme.User --compat-rule-pack myorg.proto_rules

Mechanical search-and-replace of the flag name. No changes to the
rule-pack module structure (RULES = [(rule_id, plugin_fn), ...]),
no changes to plugin context APIs, no changes to other compat flags
(--formatter-module, --ignore, --dedupe-by-type, --quiet,
etc.). protokit lint --rule-pack is unchanged — only compat's
flag is renamed.

If you'd rather keep --rule-pack working until 1.0, you can — the
deprecation warning is informational, not a CI break. Strict-warning
test environments running -W error::UserWarning should wrap legacy
--rule-pack invocations in warnings.catch_warnings() until the
migration is complete.

Test coverage

Three AE-driven tests in tests/schema/test_cli.py::TestRulePack
extend the existing class with coverage for the new flag's load path
(test_compat_rule_pack_loads_pack_no_warning), the deprecation
warning's exact-token presence
(test_rule_pack_legacy_emits_user_warning — asserts --rule-pack,
deprecated, 1.0, and --compat-rule-pack are all present in the
warning message), and the "exactly one warning when both flags
supplied" semantics (test_both_flags_accumulate_warn_once). Three
smoke binding tests in
tests/schema/test_cli.py::TestCompatRulePackBinding assert the new
flag is registered on history, bisect, and ci (one test each).
All six tests use warnings.catch_warnings(record=True) + warnings.simplefilter("always") to bypass Python's per-message
warning-dedupe registry, which would otherwise suppress the warning
on second+ invocation in the same test session.

Tag: v0.8.0. PyPI: pip install protokit==0.8.0.