chore(backlog): T7-32 — 3 micro-investigations bundled (motion_v2 srlv64 + tiny-vmaf-v2 identity + routine.py FIXME) by lusoris · Pull Request #198 · lusoris/vmaf

lusoris · 2026-04-29T10:22:30Z

Summary

Three S-effort follow-ups identified by the 2026-04-28 BACKLOG audit, bundled in one PR per the audit's hygiene rule.

(a) motion_v2 AVX2 srlv_epi64 audit. New fork-local libvmaf C unit test libvmaf/test/test_motion_v2_simd.c exercises four adversarial 16-bit fixtures (uniform-negative diffs at bpc 10 and 12; alternating-mixed-sign at bpc 10 and 12) against motion_score_pipeline_16_avx2 in libvmaf/src/feature/x86/motion_v2_avx2.c. The Phase-1 SIMD body uses _mm256_srlv_epi64 (logical) where scalar uses arithmetic >>; the test compares the AVX2 SAD against a line-for-line scalar reference duplicated from integer_motion_v2.c. On the bench host the post-abs() Phase-2 aggregation absorbs the per-lane shift difference and SAD totals match scalar — the test stays as a permanent regression guard. Closes the docs/rebase-notes.md §0038 follow-up placeholder.
(b) tiny-vmaf-v2 model identity. Research-0006 §4 referenced a non-existent tiny-vmaf-v2 prototype under ai/prototypes/. The actual largest shipped tiny-AI MLP is vmaf_tiny_v1_medium.onnx (mlp_medium, landed by PR feat(ai): tiny-AI training prep (loader + eval + Lightning harness for Netflix corpus) #158). The §4 narrative is updated to reference the real checkpoint name; QAT cost/budget framing unchanged.
(c) python/vmaf/routine.py:937,1109 FIXME verify. Both cv_on_dataset and explain_model_on_dataset hard-coded feature_option_dict=None with a FIXME comment about inconsistent behaviour with VmafQualityRunner. The FIXME describes a real defect — VmafQualityRunner reads feature_opts_dicts from the model dict at predict time; explain_model_on_dataset did not, so a model carrying per-extractor options would explain itself with mismatched feature configurations. Now: cv_on_dataset reads feature_param.feature_optional_dict when present (mirroring train_test_vmaf_on_dataset at the same file); explain_model_on_dataset reads model.model_dict["feature_opts_dicts"] (mirroring VmafQualityRunner). New regression test python/test/routine_feature_option_dict_test.py covers both None and populated-dict cases for both routines via a FeatureAssembler mock.

Pre-CLAUDE.md §12 r12: no touched-file lint cleanup needed — verify-only sub-tasks.

Deep-dive deliverables (ADR-0108)

no research digest needed: verify-only sub-tasks, no investigation.
no alternatives: verify-only fixes (each sub-task has a unique correct answer).
no rebase-sensitive AGENTS invariants in this PR.
Reproducer / smoke-test command — see Test plan.
CHANGELOG.md entry — Unreleased § Changed.
Rebase note — docs/rebase-notes.md § 0038 closed.

Test plan

meson test -C build-cpu --no-rebuild — 38/38 OK including new test_motion_v2_simd
python -m pytest python/test/routine_feature_option_dict_test.py -v --rootdir=python — 4/4 PASS
pre-commit run --files CHANGELOG.md docs/rebase-notes.md docs/research/0006-tinyai-ptq-accuracy-targets.md libvmaf/test/meson.build python/vmaf/routine.py libvmaf/test/test_motion_v2_simd.c python/test/routine_feature_option_dict_test.py — every hook PASS (clang-format / black / isort / ruff / copyright)
bash scripts/ci/check-copyright.sh — exit 0
bash scripts/ci/assertion-density.sh — PASS (every fork-added function ≥20 lines has ≥1 assert)

…v64 + tiny-vmaf-v2 identity + routine.py FIXME) Three S-effort follow-ups identified by the 2026-04-28 BACKLOG audit, bundled in one PR per the audit's hygiene rule. (a) motion_v2 AVX2 srlv_epi64 audit. New fork-local libvmaf C unit test libvmaf/test/test_motion_v2_simd.c exercises four adversarial 16-bit fixtures (uniform-negative diffs at bpc 10 and 12; alternating-mixed-sign at bpc 10 and 12) against motion_score_pipeline_16_avx2 in libvmaf/src/feature/x86/motion_v2_avx2.c. The Phase-1 SIMD body uses _mm256_srlv_epi64 (logical) where scalar uses arithmetic >>; the test compares the AVX2 SAD against a line-for-line scalar reference duplicated from integer_motion_v2.c. On the bench host the post-abs() Phase-2 aggregation absorbs the per-lane shift difference and the SAD totals match scalar — the test stays as a permanent regression guard. Closes the docs/rebase-notes.md §0038 follow-up placeholder. (b) tiny-vmaf-v2 model identity. The Research-0006 digest §4 referenced a non-existent tiny-vmaf-v2 prototype under ai/prototypes/. The actual largest shipped tiny-AI MLP is vmaf_tiny_v1_medium.onnx (mlp_medium, landed by PR #158). docs/research/0006-tinyai-ptq-accuracy-targets.md §4 is updated to reference the real checkpoint name; the QAT cost/budget framing is unchanged. (c) python/vmaf/routine.py FIXME verify. Both cv_on_dataset and explain_model_on_dataset hard-coded feature_option_dict=None with a FIXME comment about inconsistent behaviour with VmafQualityRunner. The FIXME describes a real defect: VmafQualityRunner reads feature_opts_dicts from the model dict at predict time; explain_model_on_dataset does not, so a model carrying per-extractor options would explain itself with mismatched feature configurations. Fixes: - cv_on_dataset now reads feature_param.feature_optional_dict when the param object exposes it (mirroring train_test_vmaf_on_dataset at the same file). - explain_model_on_dataset now reads model.model_dict["feature_opts_dicts"] (mirroring VmafQualityRunner). New regression test python/test/routine_feature_option_dict_test.py verifies both paths via a FeatureAssembler mock — covers None and populated-dict cases for both routines. Pre-CLAUDE.md §12 r12: no touched-file lint cleanup needed — verify-only sub-tasks. Test plan: - meson test -C build-cpu --no-rebuild -> 38/38 OK including new test_motion_v2_simd - python -m pytest python/test/routine_feature_option_dict_test.py -v -> 4/4 PASS - pre-commit run --files <touched> -> all hooks PASS - bash scripts/ci/check-copyright.sh -> exit 0 - bash scripts/ci/assertion-density.sh -> PASS Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The test_motion_v2_simd unit test used C11 `aligned_alloc`, which is not exposed by MinGW's libc and was never shipped by MSVC. CI Windows jobs (MinGW64 CPU, MSVC + CUDA, MSVC + oneAPI SYCL) all failed with `implicit declaration of function 'aligned_alloc'`. Replace the four call sites with a small static `test_aligned_malloc` / `test_aligned_free` pair that mirrors the wrapper in `libvmaf/src/mem.c`: `_aligned_malloc` / `_aligned_free` on MSVC + MinGW, `posix_memalign` / `free` elsewhere. Test logic is unchanged. Linux CPU build + test pass locally (meson test passes).

…-1a Netflix Public dataset row) Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the "Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger to TRIGGERED — the Netflix Public training corpus that gated C1 is now locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB, gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a. Verified the rest of state.md against the 2026-04-29-session merged PR set (#193–#205, #209). Every merged PR was feature / chore / docs / perf with no bug-status delta to record per CLAUDE §12 rule 13: - #193 chore(dnn) T7-12 env override removal — chore. - #194 docs(research) T7-9 NPU digest — research. - #195 feat(mcp) T5-2 embedded scaffold — feature. - #196 feat(vulkan) T7-36 cambi integration — feature. - #197 feat(motion) Netflix b949ceb port — upstream port. - #198 chore(backlog) T7-32 micro-investigations — verify-only. - #199 feat(ai) T6-9 model registry — feature. - #200 feat(hip) T7-10 HIP scaffold — feature. - #201 feat(simd) T7-38 SVE2 ports — feature. - #202 feat(ci) T6-8 parity matrix — feature. - #203 feat(ai) T6-7 FastDVDnet — feature. - #205 docs(audit) T7-4 quarterly audit — explicitly notes "no state.md changes (no upstream commit ruled in/out a fork bug)". - #209 perf(sycl) T7-17 fp64-less device — perf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er new test) New `libvmaf/test/simd_bitexact_test.h` centralises the per-test SIMD parity scaffolding that was repeated across `test_psnr_hvs_avx2.c`, `test_psnr_hvs_neon.c`, `test_moment_simd.c`, `test_motion_v2_simd.c`, and `test_ssimulacra2_simd.c`: a `xorshift32` PRNG (six file-local copies pre-PR), a portable POSIX/MinGW/MSVC aligned allocator (added in PR #198 and copy-pasted into each new test), an x86 AVX2 CPUID gate, and `SIMD_BITEXACT_ASSERT_MEMCMP` / `SIMD_BITEXACT_ASSERT_RELATIVE` assertion macros that print the first diverging byte / scalar-vs-simd values on failure. Four representative tests migrate to the harness as proof — net `-106` LOC across the four files. New SIMD parity tests now cost ~20 LOC of test-body code instead of ~50–100 LOC of scaffolding plus body. `test_ssimulacra2_simd.c` is intentionally not migrated in this PR; its `fill_random` FP rounding order is load-bearing for input bit patterns and migrating it risks shifting an existing bit-exact test's inputs. A separate dedup PR with a snapshot rerun under `/cross-backend-diff` can migrate it. Include-order invariant: callers must `#include "test.h"` BEFORE `#include "simd_bitexact_test.h"` because `test.h` lacks a header guard and would redefine the `mu_report` static inline if pulled in twice. Inline comments in each migrated test call this out; `libvmaf/test/AGENTS.md` carries the rebase-sensitive invariant row. All 41 `meson test -C build-cpu` cases pass post-refactor; clang-format + clang-tidy clean on every touched file. See ADR-0221. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-1a Netflix Public dataset row) Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the "Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger to TRIGGERED — the Netflix Public training corpus that gated C1 is now locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB, gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a. Verified the rest of state.md against the 2026-04-29-session merged PR set (#193–#205, #209). Every merged PR was feature / chore / docs / perf with no bug-status delta to record per CLAUDE §12 rule 13: - #193 chore(dnn) T7-12 env override removal — chore. - #194 docs(research) T7-9 NPU digest — research. - #195 feat(mcp) T5-2 embedded scaffold — feature. - #196 feat(vulkan) T7-36 cambi integration — feature. - #197 feat(motion) Netflix b949ceb port — upstream port. - #198 chore(backlog) T7-32 micro-investigations — verify-only. - #199 feat(ai) T6-9 model registry — feature. - #200 feat(hip) T7-10 HIP scaffold — feature. - #201 feat(simd) T7-38 SVE2 ports — feature. - #202 feat(ci) T6-8 parity matrix — feature. - #203 feat(ai) T6-7 FastDVDnet — feature. - #205 docs(audit) T7-4 quarterly audit — explicitly notes "no state.md changes (no upstream commit ruled in/out a fork bug)". - #209 perf(sycl) T7-17 fp64-less device — perf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er new test) New `libvmaf/test/simd_bitexact_test.h` centralises the per-test SIMD parity scaffolding that was repeated across `test_psnr_hvs_avx2.c`, `test_psnr_hvs_neon.c`, `test_moment_simd.c`, `test_motion_v2_simd.c`, and `test_ssimulacra2_simd.c`: a `xorshift32` PRNG (six file-local copies pre-PR), a portable POSIX/MinGW/MSVC aligned allocator (added in PR #198 and copy-pasted into each new test), an x86 AVX2 CPUID gate, and `SIMD_BITEXACT_ASSERT_MEMCMP` / `SIMD_BITEXACT_ASSERT_RELATIVE` assertion macros that print the first diverging byte / scalar-vs-simd values on failure. Four representative tests migrate to the harness as proof — net `-106` LOC across the four files. New SIMD parity tests now cost ~20 LOC of test-body code instead of ~50–100 LOC of scaffolding plus body. `test_ssimulacra2_simd.c` is intentionally not migrated in this PR; its `fill_random` FP rounding order is load-bearing for input bit patterns and migrating it risks shifting an existing bit-exact test's inputs. A separate dedup PR with a snapshot rerun under `/cross-backend-diff` can migrate it. Include-order invariant: callers must `#include "test.h"` BEFORE `#include "simd_bitexact_test.h"` because `test.h` lacks a header guard and would redefine the `mu_report` static inline if pulled in twice. Inline comments in each migrated test call this out; `libvmaf/test/AGENTS.md` carries the rebase-sensitive invariant row. All 41 `meson test -C build-cpu` cases pass post-refactor; clang-format + clang-tidy clean on every touched file. See ADR-0221. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-1a Netflix Public dataset row) Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the "Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger to TRIGGERED — the Netflix Public training corpus that gated C1 is now locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB, gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a. Verified the rest of state.md against the 2026-04-29-session merged PR set (#193–#205, #209). Every merged PR was feature / chore / docs / perf with no bug-status delta to record per CLAUDE §12 rule 13: - #193 chore(dnn) T7-12 env override removal — chore. - #194 docs(research) T7-9 NPU digest — research. - #195 feat(mcp) T5-2 embedded scaffold — feature. - #196 feat(vulkan) T7-36 cambi integration — feature. - #197 feat(motion) Netflix b949ceb port — upstream port. - #198 chore(backlog) T7-32 micro-investigations — verify-only. - #199 feat(ai) T6-9 model registry — feature. - #200 feat(hip) T7-10 HIP scaffold — feature. - #201 feat(simd) T7-38 SVE2 ports — feature. - #202 feat(ci) T6-8 parity matrix — feature. - #203 feat(ai) T6-7 FastDVDnet — feature. - #205 docs(audit) T7-4 quarterly audit — explicitly notes "no state.md changes (no upstream commit ruled in/out a fork bug)". - #209 perf(sycl) T7-17 fp64-less device — perf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-1a Netflix Public dataset row) (#245) Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the "Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger to TRIGGERED — the Netflix Public training corpus that gated C1 is now locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB, gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a. Verified the rest of state.md against the 2026-04-29-session merged PR set (#193–#205, #209). Every merged PR was feature / chore / docs / perf with no bug-status delta to record per CLAUDE §12 rule 13: - #193 chore(dnn) T7-12 env override removal — chore. - #194 docs(research) T7-9 NPU digest — research. - #195 feat(mcp) T5-2 embedded scaffold — feature. - #196 feat(vulkan) T7-36 cambi integration — feature. - #197 feat(motion) Netflix b949ceb port — upstream port. - #198 chore(backlog) T7-32 micro-investigations — verify-only. - #199 feat(ai) T6-9 model registry — feature. - #200 feat(hip) T7-10 HIP scaffold — feature. - #201 feat(simd) T7-38 SVE2 ports — feature. - #202 feat(ci) T6-8 parity matrix — feature. - #203 feat(ai) T6-7 FastDVDnet — feature. - #205 docs(audit) T7-4 quarterly audit — explicitly notes "no state.md changes (no upstream commit ruled in/out a fork bug)". - #209 perf(sycl) T7-17 fp64-less device — perf. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er new test) New `libvmaf/test/simd_bitexact_test.h` centralises the per-test SIMD parity scaffolding that was repeated across `test_psnr_hvs_avx2.c`, `test_psnr_hvs_neon.c`, `test_moment_simd.c`, `test_motion_v2_simd.c`, and `test_ssimulacra2_simd.c`: a `xorshift32` PRNG (six file-local copies pre-PR), a portable POSIX/MinGW/MSVC aligned allocator (added in PR #198 and copy-pasted into each new test), an x86 AVX2 CPUID gate, and `SIMD_BITEXACT_ASSERT_MEMCMP` / `SIMD_BITEXACT_ASSERT_RELATIVE` assertion macros that print the first diverging byte / scalar-vs-simd values on failure. Four representative tests migrate to the harness as proof — net `-106` LOC across the four files. New SIMD parity tests now cost ~20 LOC of test-body code instead of ~50–100 LOC of scaffolding plus body. `test_ssimulacra2_simd.c` is intentionally not migrated in this PR; its `fill_random` FP rounding order is load-bearing for input bit patterns and migrating it risks shifting an existing bit-exact test's inputs. A separate dedup PR with a snapshot rerun under `/cross-backend-diff` can migrate it. Include-order invariant: callers must `#include "test.h"` BEFORE `#include "simd_bitexact_test.h"` because `test.h` lacks a header guard and would redefine the `mu_report` static inline if pulled in twice. Inline comments in each migrated test call this out; `libvmaf/test/AGENTS.md` carries the rebase-sensitive invariant row. All 41 `meson test -C build-cpu` cases pass post-refactor; clang-format + clang-tidy clean on every touched file. See ADR-0221. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er new test) (#252) New `libvmaf/test/simd_bitexact_test.h` centralises the per-test SIMD parity scaffolding that was repeated across `test_psnr_hvs_avx2.c`, `test_psnr_hvs_neon.c`, `test_moment_simd.c`, `test_motion_v2_simd.c`, and `test_ssimulacra2_simd.c`: a `xorshift32` PRNG (six file-local copies pre-PR), a portable POSIX/MinGW/MSVC aligned allocator (added in PR #198 and copy-pasted into each new test), an x86 AVX2 CPUID gate, and `SIMD_BITEXACT_ASSERT_MEMCMP` / `SIMD_BITEXACT_ASSERT_RELATIVE` assertion macros that print the first diverging byte / scalar-vs-simd values on failure. Four representative tests migrate to the harness as proof — net `-106` LOC across the four files. New SIMD parity tests now cost ~20 LOC of test-body code instead of ~50–100 LOC of scaffolding plus body. `test_ssimulacra2_simd.c` is intentionally not migrated in this PR; its `fill_random` FP rounding order is load-bearing for input bit patterns and migrating it risks shifting an existing bit-exact test's inputs. A separate dedup PR with a snapshot rerun under `/cross-backend-diff` can migrate it. Include-order invariant: callers must `#include "test.h"` BEFORE `#include "simd_bitexact_test.h"` because `test.h` lacks a header guard and would redefine the `mu_report` static inline if pulled in twice. Inline comments in each migrated test call this out; `libvmaf/test/AGENTS.md` carries the rebase-sensitive invariant row. All 41 `meson test -C build-cpu` cases pass post-refactor; clang-format + clang-tidy clean on every touched file. See ADR-0221. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Lusoris and others added 2 commits April 29, 2026 15:58

lusoris force-pushed the chore/t7-32-backlog-hygiene-bundle branch from e3bd584 to 1eb7a50 Compare April 29, 2026 13:58

lusoris merged commit 8e0eb8f into master Apr 29, 2026
50 checks passed

lusoris deleted the chore/t7-32-backlog-hygiene-bundle branch April 29, 2026 14:18

github-actions Bot mentioned this pull request Apr 29, 2026

chore: release master #1

Open

lusoris mentioned this pull request Apr 29, 2026

docs(state): post-session refresh + unblock T6-1a Netflix Public dataset deferral #220

Closed

4 tasks

lusoris mentioned this pull request May 1, 2026

docs(state): post-session refresh + unblock T6-1a Netflix Public dataset deferral #245

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(backlog): T7-32 — 3 micro-investigations bundled (motion_v2 srlv64 + tiny-vmaf-v2 identity + routine.py FIXME)#198

chore(backlog): T7-32 — 3 micro-investigations bundled (motion_v2 srlv64 + tiny-vmaf-v2 identity + routine.py FIXME)#198
lusoris merged 2 commits intomasterfrom
chore/t7-32-backlog-hygiene-bundle

lusoris commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lusoris commented Apr 29, 2026

Summary

Deep-dive deliverables (ADR-0108)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant