Skip to content

chore(backlog): T7-32 — 3 micro-investigations bundled (motion_v2 srlv64 + tiny-vmaf-v2 identity + routine.py FIXME)#198

Merged
lusoris merged 2 commits intomasterfrom
chore/t7-32-backlog-hygiene-bundle
Apr 29, 2026
Merged

chore(backlog): T7-32 — 3 micro-investigations bundled (motion_v2 srlv64 + tiny-vmaf-v2 identity + routine.py FIXME)#198
lusoris merged 2 commits intomasterfrom
chore/t7-32-backlog-hygiene-bundle

Conversation

@lusoris
Copy link
Copy Markdown
Owner

@lusoris lusoris commented Apr 29, 2026

Summary

Three S-effort follow-ups identified by the 2026-04-28 BACKLOG audit, bundled in one PR per the audit's hygiene rule.

  • (a) motion_v2 AVX2 srlv_epi64 audit. New fork-local libvmaf C unit test libvmaf/test/test_motion_v2_simd.c exercises four adversarial 16-bit fixtures (uniform-negative diffs at bpc 10 and 12; alternating-mixed-sign at bpc 10 and 12) against motion_score_pipeline_16_avx2 in libvmaf/src/feature/x86/motion_v2_avx2.c. The Phase-1 SIMD body uses _mm256_srlv_epi64 (logical) where scalar uses arithmetic >>; the test compares the AVX2 SAD against a line-for-line scalar reference duplicated from integer_motion_v2.c. On the bench host the post-abs() Phase-2 aggregation absorbs the per-lane shift difference and SAD totals match scalar — the test stays as a permanent regression guard. Closes the docs/rebase-notes.md §0038 follow-up placeholder.
  • (b) tiny-vmaf-v2 model identity. Research-0006 §4 referenced a non-existent tiny-vmaf-v2 prototype under ai/prototypes/. The actual largest shipped tiny-AI MLP is vmaf_tiny_v1_medium.onnx (mlp_medium, landed by PR feat(ai): tiny-AI training prep (loader + eval + Lightning harness for Netflix corpus) #158). The §4 narrative is updated to reference the real checkpoint name; QAT cost/budget framing unchanged.
  • (c) python/vmaf/routine.py:937,1109 FIXME verify. Both cv_on_dataset and explain_model_on_dataset hard-coded feature_option_dict=None with a FIXME comment about inconsistent behaviour with VmafQualityRunner. The FIXME describes a real defect — VmafQualityRunner reads feature_opts_dicts from the model dict at predict time; explain_model_on_dataset did not, so a model carrying per-extractor options would explain itself with mismatched feature configurations. Now: cv_on_dataset reads feature_param.feature_optional_dict when present (mirroring train_test_vmaf_on_dataset at the same file); explain_model_on_dataset reads model.model_dict["feature_opts_dicts"] (mirroring VmafQualityRunner). New regression test python/test/routine_feature_option_dict_test.py covers both None and populated-dict cases for both routines via a FeatureAssembler mock.

Pre-CLAUDE.md §12 r12: no touched-file lint cleanup needed — verify-only sub-tasks.

Deep-dive deliverables (ADR-0108)

  • no research digest needed: verify-only sub-tasks, no investigation.
  • no alternatives: verify-only fixes (each sub-task has a unique correct answer).
  • no rebase-sensitive AGENTS invariants in this PR.
  • Reproducer / smoke-test command — see Test plan.
  • CHANGELOG.md entry — Unreleased § Changed.
  • Rebase notedocs/rebase-notes.md § 0038 closed.

Test plan

  • meson test -C build-cpu --no-rebuild — 38/38 OK including new test_motion_v2_simd
  • python -m pytest python/test/routine_feature_option_dict_test.py -v --rootdir=python — 4/4 PASS
  • pre-commit run --files CHANGELOG.md docs/rebase-notes.md docs/research/0006-tinyai-ptq-accuracy-targets.md libvmaf/test/meson.build python/vmaf/routine.py libvmaf/test/test_motion_v2_simd.c python/test/routine_feature_option_dict_test.py — every hook PASS (clang-format / black / isort / ruff / copyright)
  • bash scripts/ci/check-copyright.sh — exit 0
  • bash scripts/ci/assertion-density.sh — PASS (every fork-added function ≥20 lines has ≥1 assert)

Lusoris and others added 2 commits April 29, 2026 15:58
…v64 + tiny-vmaf-v2 identity + routine.py FIXME)

Three S-effort follow-ups identified by the 2026-04-28 BACKLOG audit,
bundled in one PR per the audit's hygiene rule.

(a) motion_v2 AVX2 srlv_epi64 audit. New fork-local libvmaf C unit
test libvmaf/test/test_motion_v2_simd.c exercises four adversarial
16-bit fixtures (uniform-negative diffs at bpc 10 and 12;
alternating-mixed-sign at bpc 10 and 12) against
motion_score_pipeline_16_avx2 in
libvmaf/src/feature/x86/motion_v2_avx2.c. The Phase-1 SIMD body uses
_mm256_srlv_epi64 (logical) where scalar uses arithmetic >>; the
test compares the AVX2 SAD against a line-for-line scalar reference
duplicated from integer_motion_v2.c. On the bench host the
post-abs() Phase-2 aggregation absorbs the per-lane shift difference
and the SAD totals match scalar — the test stays as a permanent
regression guard. Closes the docs/rebase-notes.md §0038 follow-up
placeholder.

(b) tiny-vmaf-v2 model identity. The Research-0006 digest §4
referenced a non-existent tiny-vmaf-v2 prototype under
ai/prototypes/. The actual largest shipped tiny-AI MLP is
vmaf_tiny_v1_medium.onnx (mlp_medium, landed by PR #158).
docs/research/0006-tinyai-ptq-accuracy-targets.md §4 is updated to
reference the real checkpoint name; the QAT cost/budget framing is
unchanged.

(c) python/vmaf/routine.py FIXME verify. Both cv_on_dataset and
explain_model_on_dataset hard-coded feature_option_dict=None with a
FIXME comment about inconsistent behaviour with VmafQualityRunner.
The FIXME describes a real defect: VmafQualityRunner reads
feature_opts_dicts from the model dict at predict time;
explain_model_on_dataset does not, so a model carrying per-extractor
options would explain itself with mismatched feature configurations.
Fixes:
  - cv_on_dataset now reads feature_param.feature_optional_dict
    when the param object exposes it (mirroring
    train_test_vmaf_on_dataset at the same file).
  - explain_model_on_dataset now reads
    model.model_dict["feature_opts_dicts"] (mirroring
    VmafQualityRunner).
New regression test python/test/routine_feature_option_dict_test.py
verifies both paths via a FeatureAssembler mock — covers None and
populated-dict cases for both routines.

Pre-CLAUDE.md §12 r12: no touched-file lint cleanup needed —
verify-only sub-tasks.

Test plan:
  - meson test -C build-cpu --no-rebuild
    -> 38/38 OK including new test_motion_v2_simd
  - python -m pytest python/test/routine_feature_option_dict_test.py -v
    -> 4/4 PASS
  - pre-commit run --files <touched>
    -> all hooks PASS
  - bash scripts/ci/check-copyright.sh -> exit 0
  - bash scripts/ci/assertion-density.sh -> PASS

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The test_motion_v2_simd unit test used C11 `aligned_alloc`, which is
not exposed by MinGW's libc and was never shipped by MSVC. CI Windows
jobs (MinGW64 CPU, MSVC + CUDA, MSVC + oneAPI SYCL) all failed with
`implicit declaration of function 'aligned_alloc'`.

Replace the four call sites with a small static `test_aligned_malloc`
/ `test_aligned_free` pair that mirrors the wrapper in
`libvmaf/src/mem.c`: `_aligned_malloc` / `_aligned_free` on
MSVC + MinGW, `posix_memalign` / `free` elsewhere. Test logic is
unchanged.

Linux CPU build + test pass locally (meson test passes).
@lusoris lusoris force-pushed the chore/t7-32-backlog-hygiene-bundle branch from e3bd584 to 1eb7a50 Compare April 29, 2026 13:58
@lusoris lusoris merged commit 8e0eb8f into master Apr 29, 2026
50 checks passed
@lusoris lusoris deleted the chore/t7-32-backlog-hygiene-bundle branch April 29, 2026 14:18
@github-actions github-actions Bot mentioned this pull request Apr 29, 2026
lusoris pushed a commit that referenced this pull request May 1, 2026
…-1a Netflix Public dataset row)

Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the
"Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger
to TRIGGERED — the Netflix Public training corpus that gated C1 is now
locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB,
gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a.

Verified the rest of state.md against the 2026-04-29-session merged PR
set (#193#205, #209). Every merged PR was feature / chore / docs / perf
with no bug-status delta to record per CLAUDE §12 rule 13:
- #193 chore(dnn) T7-12 env override removal — chore.
- #194 docs(research) T7-9 NPU digest — research.
- #195 feat(mcp) T5-2 embedded scaffold — feature.
- #196 feat(vulkan) T7-36 cambi integration — feature.
- #197 feat(motion) Netflix b949ceb port — upstream port.
- #198 chore(backlog) T7-32 micro-investigations — verify-only.
- #199 feat(ai) T6-9 model registry — feature.
- #200 feat(hip) T7-10 HIP scaffold — feature.
- #201 feat(simd) T7-38 SVE2 ports — feature.
- #202 feat(ci) T6-8 parity matrix — feature.
- #203 feat(ai) T6-7 FastDVDnet — feature.
- #205 docs(audit) T7-4 quarterly audit — explicitly notes "no
  state.md changes (no upstream commit ruled in/out a fork bug)".
- #209 perf(sycl) T7-17 fp64-less device — perf.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 1, 2026
…er new test)

New `libvmaf/test/simd_bitexact_test.h` centralises the per-test SIMD
parity scaffolding that was repeated across `test_psnr_hvs_avx2.c`,
`test_psnr_hvs_neon.c`, `test_moment_simd.c`, `test_motion_v2_simd.c`,
and `test_ssimulacra2_simd.c`: a `xorshift32` PRNG (six file-local
copies pre-PR), a portable POSIX/MinGW/MSVC aligned allocator (added
in PR #198 and copy-pasted into each new test), an x86 AVX2 CPUID
gate, and `SIMD_BITEXACT_ASSERT_MEMCMP` /
`SIMD_BITEXACT_ASSERT_RELATIVE` assertion macros that print the first
diverging byte / scalar-vs-simd values on failure.

Four representative tests migrate to the harness as proof — net `-106`
LOC across the four files. New SIMD parity tests now cost ~20 LOC of
test-body code instead of ~50–100 LOC of scaffolding plus body.
`test_ssimulacra2_simd.c` is intentionally not migrated in this PR;
its `fill_random` FP rounding order is load-bearing for input bit
patterns and migrating it risks shifting an existing bit-exact test's
inputs. A separate dedup PR with a snapshot rerun under
`/cross-backend-diff` can migrate it.

Include-order invariant: callers must `#include "test.h"` BEFORE
`#include "simd_bitexact_test.h"` because `test.h` lacks a header
guard and would redefine the `mu_report` static inline if pulled in
twice. Inline comments in each migrated test call this out;
`libvmaf/test/AGENTS.md` carries the rebase-sensitive invariant row.

All 41 `meson test -C build-cpu` cases pass post-refactor;
clang-format + clang-tidy clean on every touched file. See
ADR-0221.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 1, 2026
…-1a Netflix Public dataset row)

Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the
"Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger
to TRIGGERED — the Netflix Public training corpus that gated C1 is now
locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB,
gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a.

Verified the rest of state.md against the 2026-04-29-session merged PR
set (#193#205, #209). Every merged PR was feature / chore / docs / perf
with no bug-status delta to record per CLAUDE §12 rule 13:
- #193 chore(dnn) T7-12 env override removal — chore.
- #194 docs(research) T7-9 NPU digest — research.
- #195 feat(mcp) T5-2 embedded scaffold — feature.
- #196 feat(vulkan) T7-36 cambi integration — feature.
- #197 feat(motion) Netflix b949ceb port — upstream port.
- #198 chore(backlog) T7-32 micro-investigations — verify-only.
- #199 feat(ai) T6-9 model registry — feature.
- #200 feat(hip) T7-10 HIP scaffold — feature.
- #201 feat(simd) T7-38 SVE2 ports — feature.
- #202 feat(ci) T6-8 parity matrix — feature.
- #203 feat(ai) T6-7 FastDVDnet — feature.
- #205 docs(audit) T7-4 quarterly audit — explicitly notes "no
  state.md changes (no upstream commit ruled in/out a fork bug)".
- #209 perf(sycl) T7-17 fp64-less device — perf.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 1, 2026
…er new test)

New `libvmaf/test/simd_bitexact_test.h` centralises the per-test SIMD
parity scaffolding that was repeated across `test_psnr_hvs_avx2.c`,
`test_psnr_hvs_neon.c`, `test_moment_simd.c`, `test_motion_v2_simd.c`,
and `test_ssimulacra2_simd.c`: a `xorshift32` PRNG (six file-local
copies pre-PR), a portable POSIX/MinGW/MSVC aligned allocator (added
in PR #198 and copy-pasted into each new test), an x86 AVX2 CPUID
gate, and `SIMD_BITEXACT_ASSERT_MEMCMP` /
`SIMD_BITEXACT_ASSERT_RELATIVE` assertion macros that print the first
diverging byte / scalar-vs-simd values on failure.

Four representative tests migrate to the harness as proof — net `-106`
LOC across the four files. New SIMD parity tests now cost ~20 LOC of
test-body code instead of ~50–100 LOC of scaffolding plus body.
`test_ssimulacra2_simd.c` is intentionally not migrated in this PR;
its `fill_random` FP rounding order is load-bearing for input bit
patterns and migrating it risks shifting an existing bit-exact test's
inputs. A separate dedup PR with a snapshot rerun under
`/cross-backend-diff` can migrate it.

Include-order invariant: callers must `#include "test.h"` BEFORE
`#include "simd_bitexact_test.h"` because `test.h` lacks a header
guard and would redefine the `mu_report` static inline if pulled in
twice. Inline comments in each migrated test call this out;
`libvmaf/test/AGENTS.md` carries the rebase-sensitive invariant row.

All 41 `meson test -C build-cpu` cases pass post-refactor;
clang-format + clang-tidy clean on every touched file. See
ADR-0221.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 1, 2026
…-1a Netflix Public dataset row)

Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the
"Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger
to TRIGGERED — the Netflix Public training corpus that gated C1 is now
locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB,
gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a.

Verified the rest of state.md against the 2026-04-29-session merged PR
set (#193#205, #209). Every merged PR was feature / chore / docs / perf
with no bug-status delta to record per CLAUDE §12 rule 13:
- #193 chore(dnn) T7-12 env override removal — chore.
- #194 docs(research) T7-9 NPU digest — research.
- #195 feat(mcp) T5-2 embedded scaffold — feature.
- #196 feat(vulkan) T7-36 cambi integration — feature.
- #197 feat(motion) Netflix b949ceb port — upstream port.
- #198 chore(backlog) T7-32 micro-investigations — verify-only.
- #199 feat(ai) T6-9 model registry — feature.
- #200 feat(hip) T7-10 HIP scaffold — feature.
- #201 feat(simd) T7-38 SVE2 ports — feature.
- #202 feat(ci) T6-8 parity matrix — feature.
- #203 feat(ai) T6-7 FastDVDnet — feature.
- #205 docs(audit) T7-4 quarterly audit — explicitly notes "no
  state.md changes (no upstream commit ruled in/out a fork bug)".
- #209 perf(sycl) T7-17 fp64-less device — perf.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 1, 2026
…-1a Netflix Public dataset row) (#245)

Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the
"Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger
to TRIGGERED — the Netflix Public training corpus that gated C1 is now
locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB,
gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a.

Verified the rest of state.md against the 2026-04-29-session merged PR
set (#193#205, #209). Every merged PR was feature / chore / docs / perf
with no bug-status delta to record per CLAUDE §12 rule 13:
- #193 chore(dnn) T7-12 env override removal — chore.
- #194 docs(research) T7-9 NPU digest — research.
- #195 feat(mcp) T5-2 embedded scaffold — feature.
- #196 feat(vulkan) T7-36 cambi integration — feature.
- #197 feat(motion) Netflix b949ceb port — upstream port.
- #198 chore(backlog) T7-32 micro-investigations — verify-only.
- #199 feat(ai) T6-9 model registry — feature.
- #200 feat(hip) T7-10 HIP scaffold — feature.
- #201 feat(simd) T7-38 SVE2 ports — feature.
- #202 feat(ci) T6-8 parity matrix — feature.
- #203 feat(ai) T6-7 FastDVDnet — feature.
- #205 docs(audit) T7-4 quarterly audit — explicitly notes "no
  state.md changes (no upstream commit ruled in/out a fork bug)".
- #209 perf(sycl) T7-17 fp64-less device — perf.

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 2, 2026
…er new test)

New `libvmaf/test/simd_bitexact_test.h` centralises the per-test SIMD
parity scaffolding that was repeated across `test_psnr_hvs_avx2.c`,
`test_psnr_hvs_neon.c`, `test_moment_simd.c`, `test_motion_v2_simd.c`,
and `test_ssimulacra2_simd.c`: a `xorshift32` PRNG (six file-local
copies pre-PR), a portable POSIX/MinGW/MSVC aligned allocator (added
in PR #198 and copy-pasted into each new test), an x86 AVX2 CPUID
gate, and `SIMD_BITEXACT_ASSERT_MEMCMP` /
`SIMD_BITEXACT_ASSERT_RELATIVE` assertion macros that print the first
diverging byte / scalar-vs-simd values on failure.

Four representative tests migrate to the harness as proof — net `-106`
LOC across the four files. New SIMD parity tests now cost ~20 LOC of
test-body code instead of ~50–100 LOC of scaffolding plus body.
`test_ssimulacra2_simd.c` is intentionally not migrated in this PR;
its `fill_random` FP rounding order is load-bearing for input bit
patterns and migrating it risks shifting an existing bit-exact test's
inputs. A separate dedup PR with a snapshot rerun under
`/cross-backend-diff` can migrate it.

Include-order invariant: callers must `#include "test.h"` BEFORE
`#include "simd_bitexact_test.h"` because `test.h` lacks a header
guard and would redefine the `mu_report` static inline if pulled in
twice. Inline comments in each migrated test call this out;
`libvmaf/test/AGENTS.md` carries the rebase-sensitive invariant row.

All 41 `meson test -C build-cpu` cases pass post-refactor;
clang-format + clang-tidy clean on every touched file. See
ADR-0221.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 2, 2026
…er new test) (#252)

New `libvmaf/test/simd_bitexact_test.h` centralises the per-test SIMD
parity scaffolding that was repeated across `test_psnr_hvs_avx2.c`,
`test_psnr_hvs_neon.c`, `test_moment_simd.c`, `test_motion_v2_simd.c`,
and `test_ssimulacra2_simd.c`: a `xorshift32` PRNG (six file-local
copies pre-PR), a portable POSIX/MinGW/MSVC aligned allocator (added
in PR #198 and copy-pasted into each new test), an x86 AVX2 CPUID
gate, and `SIMD_BITEXACT_ASSERT_MEMCMP` /
`SIMD_BITEXACT_ASSERT_RELATIVE` assertion macros that print the first
diverging byte / scalar-vs-simd values on failure.

Four representative tests migrate to the harness as proof — net `-106`
LOC across the four files. New SIMD parity tests now cost ~20 LOC of
test-body code instead of ~50–100 LOC of scaffolding plus body.
`test_ssimulacra2_simd.c` is intentionally not migrated in this PR;
its `fill_random` FP rounding order is load-bearing for input bit
patterns and migrating it risks shifting an existing bit-exact test's
inputs. A separate dedup PR with a snapshot rerun under
`/cross-backend-diff` can migrate it.

Include-order invariant: callers must `#include "test.h"` BEFORE
`#include "simd_bitexact_test.h"` because `test.h` lacks a header
guard and would redefine the `mu_report` static inline if pulled in
twice. Inline comments in each migrated test call this out;
`libvmaf/test/AGENTS.md` carries the rebase-sensitive invariant row.

All 41 `meson test -C build-cpu` cases pass post-refactor;
clang-format + clang-tidy clean on every touched file. See
ADR-0221.

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant