Conversation
…lix#755, ADR-0154) Several feature extractors (integer_motion's motion2/motion3, the five-frame-window variant) write frame N's score retroactively when frame N+1 is extracted — and on flush for the tail. Previously `vmaf_score_pooled(ctx, ..., i, i)` called immediately after `vmaf_read_pictures(ctx, ref, dist, i)` returned -EINVAL, indistinguishable from programmer error. Split the error: - -EAGAIN: feature index is valid but not yet written (transient — retry after one more read or after flush). - -EINVAL: genuine misuse (bad pointer, out-of-range, unknown feature name). Changes: - feature_collector.c:vmaf_feature_collector_get_score — "not written" branch returns -EAGAIN. - feature_collector.h:vmaf_feature_vector_get_score — inline fast-path splits the previous combined `return -1` into -EINVAL (null/out-of-range) and -EAGAIN (not written). Added #include <errno.h>. - Drive-by (ADR-0141 touched-file rule): rename reserved header guard `__VMAF_FEATURE_COLLECTOR_H__` -> `VMAF_FEATURE_COLLECTOR_INCLUDED`. Visible behaviour change: - Callers that want to distinguish transient from fatal can now branch on -EAGAIN and retry. - Callers that treat any non-zero as fatal are unchanged. - Inline get_score previously returned -1 for both cases; now splits the same way. Test: libvmaf/test/test_score_pooled_eagain.c with 4 subtests: 1. `returns_eagain_on_pending` — submit frames 0 + 1, then score_pooled(1,1) must return -EAGAIN (motion2 for frame 1 not written yet). 2. `streaming_pattern` — after read(3), score_pooled(0,0) and score_pooled(1,1) must both succeed (retroactive writes have filled in). 3. `after_flush_complete` — after flush, every in-range index is poolable. 4. `still_rejects_bad_range` — inverted range and NULL score pointer still return -EINVAL (not -EAGAIN). Verified as reducer: git stash on feature_collector.{c,h} makes the new test fail; restoring passes. 35/35 total tests pass. Closes backlog item T1-1. Deliverables (ADR-0108): 1. research digest: no digest needed - error-code semantics fix 2. decision matrix: ADR-0154 §Alternatives considered 3. AGENTS.md invariant: fork-ahead -EAGAIN/-EINVAL split documented in rebase-notes 0047 4. reproducer: standalone C program matching Netflix#755 pattern 5. CHANGELOG: fork entry under Changed 6. rebase-notes: entry 0047 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
Apr 24, 2026
…nner guide, doc-drift enforcement (#103) Bundles the four open Tier-7 long-tail items from `.workingdir2/BACKLOG.md` plus the audit-flagged docs gaps that surfaced during scope-checking. T7-1 — Tracked docs/state.md + bug-status hygiene rule (ADR-0165) Closes Issue #20. New tracked file `docs/state.md` (Open / Recently closed / Confirmed not-affected / Deferred) is the canonical in-tree bug-status surface. New CLAUDE.md §12 rule 13 mandates a same-PR update on every bug close / open / rule-out. PR template carries a checkbox; opt-out `no state delta: REASON` for PRs without bug-status impact. ADRs cover decisions, this file covers bug status. T7-2 — MCP server release artifact channel (ADR-0166) Both PyPI (Trusted Publishing via OIDC, no token) and GitHub release attachment with Sigstore keyless signing + PEP 740 attestations + SLSA L3 provenance. Wired as new `mcp-build` / `mcp-sign` / `mcp-publish-pypi` jobs in the existing supply-chain.yml. After this lands, `pip install vmaf-mcp` works. One-time PyPI Trusted Publisher binding required (operational note in the ADR). T7-3 — Self-hosted GPU runner enrollment guide New docs/development/self-hosted-runner.md pins the registration steps so an operator can stand a runner up in ~10 minutes. Per popup 2026-04-25 the user's local dev box (CUDA + Intel) will be the first runner. Fine-grained label scheme (`gpu-cuda`, `gpu-intel`, `avx512`) reserved for future job targeting. ADR-0167 — Path-mapped doc-drift enforcement Closes the gap surfaced by the 2026-04-25 docs audit (16 PRs landed in 2 days; 2 HIGH + 4 MEDIUM doc gaps slipped past the existing checks because the workflow was advisory + accepted ADR additions as "docs were touched"). Two layers: Layer 1 (in-session): new project hook `.claude/hooks/docs-drift-warn.sh` (PostToolUse:Edit|Write) emits an informational `NOTICE` when a user-discoverable surface is touched but no matching `docs/<topic>/` file is touched. Mirrors the `auto-snapshot-warn.sh` pattern — informational stderr, no block. Layer 2 (pre-merge): rule-enforcement.yml `doc-substance-check` promoted from advisory (`continue-on-error: true`) to blocking + rewritten with a path-mapped surface→docs check. ADR additions no longer satisfy. Per-PR opt-out `no docs needed: REASON` for genuine internal-refactor / bug-fix / test PRs. Audit fixes (2 HIGH + 4 MEDIUM): - docs/api/gpu.md — vmaf_cuda_state_free() public API documented (was: missing entirely, despite the symbol shipping in PR #94). - docs/api/index.md — -EAGAIN error code added to the error semantics list (PR #91 / ADR-0154). - docs/api/index.md — vmaf_read_pictures monotonic-index requirement documented (PR #88 / ADR-0152). - docs/metrics/features.md — SSIMULACRA 2 backends matrix updated (was: "scalar only. SIMD / GPU paths are follow-up workstreams" 36 hours after PRs #98/#99/#100 landed all three SIMD ports). - docs/metrics/features.md — PSNR-HVS backends updated for AVX2 (PR #96) + NEON (PR #97). - docs/metrics/features.md — float_ms_ssim <176×176 minimum documented (PR #90 / ADR-0153). ADRs: 0165, 0166, 0167. Closes BACKLOG T7-1, T7-2, T7-3 + Issue #20. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses Netflix upstream issue #755 ("VMAF functions exit with error when vmaf_read_pictures and vmaf_score_pooled calls are interleaved", OPEN).
Root cause:
integer_motion's motion2/motion3 (and the 5-frame-window variant) write frame N's score retroactively when frame N+1 is extracted — and the tail on flush. So at the instantvmaf_read_pictures(ctx, ref, dist, i)returns, framei's motion2 is not yet in the feature collector.vmaf_feature_collector_get_scorereturned-EINVALin that case — indistinguishable from genuine misuse (bad pointer, out-of-range, unknown feature).Fix: split the error code.
-EAGAIN→ feature index is valid but not yet written (transient; retry after one morevmaf_read_picturesor after flush).-EINVAL→ genuine misuse (null pointer,i >= capacity, unknown feature name).Downstream integrations that want per-frame streaming VMAF output can now distinguish transient-retry from abort-now. Upstream maintainer closed the door on the streaming use case in 2020 ("you cannot call vmaf_score_pooled() in a loop"); the fork reopens it with one changed error code and one inline-helper signature tweak — no change to the retroactive-write design.
Scope
libvmaf/src/feature/feature_collector.c:vmaf_feature_collector_get_score— the "not written" branch returns-EAGAIN.libvmaf/src/feature/feature_collector.h:vmaf_feature_vector_get_score(inline fast-path) — splits the previous combinedreturn -1into-EINVAL(null / out-of-range) and-EAGAIN(not written). Added#include <errno.h>.__VMAF_FEATURE_COLLECTOR_H__header guard toVMAF_FEATURE_COLLECTOR_INCLUDED.Supported streaming pattern (documented via the new test):
Test
New
libvmaf/test/test_score_pooled_eagain.cwith 4 subtests:returns_eagain_on_pending— submit frames 0 + 1 thenscore_pooled(1,1)must return-EAGAIN(motion2 for frame 1 not written yet).streaming_pattern— after read(3),score_pooled(0,0)andscore_pooled(1,1)both succeed with finite scores.after_flush_complete— after flush, every in-range index is poolable.still_rejects_bad_range— inverted range + NULL score pointer still return-EINVAL(not-EAGAIN).Reducer confirmed:
git stash push libvmaf/src/feature/feature_collector.{c,h} && ninja -C build && meson test -C build test_score_pooled_eagainreportsFail: 1.Type
fix— bug fixChecklist
meson test -C build→ 35/35.clang-tidy -p build libvmaf/src/feature/feature_collector.{c,h}→ zero warnings.### Changedin CHANGELOG.Netflix golden-data gate (ADR-0024)
assertAlmostEqual(...)score in the Netflix golden Python tests.Cross-backend numerical results
Deep-dive deliverables (ADR-0108)
AGENTS.mdinvariant note — fork-ahead -EAGAIN/-EINVAL split in rebase-notes 0047.CHANGELOG.md"lusoris fork" entry — under### Changed.Reproducer
Issue-matching standalone reproducer (loop of read + per-frame pool):
Known follow-ups
vmaf_feature_available_up_to(ctx, unsigned *i_max)polling API could build on this. Out of scope for this PR; the-EAGAINsignal is the minimal contract.🤖 Generated with Claude Code