Skip to content

Pipeline encode stage#408

Merged
jeromekelleher merged 2 commits into
sgkit-dev:mainfrom
jeromekelleher:pipeline-encode-stage
May 19, 2026
Merged

Pipeline encode stage#408
jeromekelleher merged 2 commits into
sgkit-dev:mainfrom
jeromekelleher:pipeline-encode-stage

Conversation

@jeromekelleher
Copy link
Copy Markdown
Member

No description provided.

Introduce _EncodedChunk dataclass and _encoded_chunks generator method
on FormatEncoder. _restart now wires self._iterator to that generator
instead of variant_chunks directly; _advance pops _EncodedChunk items
and publishes them. No behaviour change — _encode_chunk still runs on
the consumer thread synchronously; this only sets the seam for a later
PrefetchIterator wrap to push the encode work one chunk ahead.
Promote _PrefetchIterator from retrieval.py to utils.PrefetchIterator
(public; same single-worker / one-deep / drain-on-close contract).
Wrap FormatEncoder._encoded_chunks in utils.PrefetchIterator so that
_encode_chunk runs on a background prefetch worker while the consumer
drains the previous chunk's bytes — the encode pipeline now stacks
on top of the existing StreamReader block-prefetch and the inner
variant_chunks PrefetchIterator.

Tests:
- Move TestPrefetchIterator from test_retrieval.py to test_utils.py
  and rename to drop the private prefix.
- Add TestReadahead to test_format_encoder.py covering: iterator-type
  assertion, encode runs on the vcztools-prefetch thread, worker
  exception surfaces on the consumer's read(), close joins the worker,
  no thread leaks across 100 construct/close cycles.
- Replace _FakeEncoder's one-shot _encode_raise_once trigger in
  test_restart_after_failed_advance_emits_restart_not_init with a
  chunk-content-based _fail_if_first_variant trigger so the failure
  hits the chunk the consumer is actually waiting for (the prior flag
  could be consumed silently by a drained in-flight prefetch).

Benchmark (wide_bench.vcz, _DiscardSink drain) is within noise — encode
is the bottleneck and the synthetic drain is trivial, so producer/
consumer overlap has nothing to recover. Real workloads with a non-
trivial drain (file/network) should benefit.
@jeromekelleher jeromekelleher merged commit d5aa1b0 into sgkit-dev:main May 19, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant