Conversation
…v64 + tiny-vmaf-v2 identity + routine.py FIXME) Three S-effort follow-ups identified by the 2026-04-28 BACKLOG audit, bundled in one PR per the audit's hygiene rule. (a) motion_v2 AVX2 srlv_epi64 audit. New fork-local libvmaf C unit test libvmaf/test/test_motion_v2_simd.c exercises four adversarial 16-bit fixtures (uniform-negative diffs at bpc 10 and 12; alternating-mixed-sign at bpc 10 and 12) against motion_score_pipeline_16_avx2 in libvmaf/src/feature/x86/motion_v2_avx2.c. The Phase-1 SIMD body uses _mm256_srlv_epi64 (logical) where scalar uses arithmetic >>; the test compares the AVX2 SAD against a line-for-line scalar reference duplicated from integer_motion_v2.c. On the bench host the post-abs() Phase-2 aggregation absorbs the per-lane shift difference and the SAD totals match scalar — the test stays as a permanent regression guard. Closes the docs/rebase-notes.md §0038 follow-up placeholder. (b) tiny-vmaf-v2 model identity. The Research-0006 digest §4 referenced a non-existent tiny-vmaf-v2 prototype under ai/prototypes/. The actual largest shipped tiny-AI MLP is vmaf_tiny_v1_medium.onnx (mlp_medium, landed by PR #158). docs/research/0006-tinyai-ptq-accuracy-targets.md §4 is updated to reference the real checkpoint name; the QAT cost/budget framing is unchanged. (c) python/vmaf/routine.py FIXME verify. Both cv_on_dataset and explain_model_on_dataset hard-coded feature_option_dict=None with a FIXME comment about inconsistent behaviour with VmafQualityRunner. The FIXME describes a real defect: VmafQualityRunner reads feature_opts_dicts from the model dict at predict time; explain_model_on_dataset does not, so a model carrying per-extractor options would explain itself with mismatched feature configurations. Fixes: - cv_on_dataset now reads feature_param.feature_optional_dict when the param object exposes it (mirroring train_test_vmaf_on_dataset at the same file). - explain_model_on_dataset now reads model.model_dict["feature_opts_dicts"] (mirroring VmafQualityRunner). New regression test python/test/routine_feature_option_dict_test.py verifies both paths via a FeatureAssembler mock — covers None and populated-dict cases for both routines. Pre-CLAUDE.md §12 r12: no touched-file lint cleanup needed — verify-only sub-tasks. Test plan: - meson test -C build-cpu --no-rebuild -> 38/38 OK including new test_motion_v2_simd - python -m pytest python/test/routine_feature_option_dict_test.py -v -> 4/4 PASS - pre-commit run --files <touched> -> all hooks PASS - bash scripts/ci/check-copyright.sh -> exit 0 - bash scripts/ci/assertion-density.sh -> PASS Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The test_motion_v2_simd unit test used C11 `aligned_alloc`, which is not exposed by MinGW's libc and was never shipped by MSVC. CI Windows jobs (MinGW64 CPU, MSVC + CUDA, MSVC + oneAPI SYCL) all failed with `implicit declaration of function 'aligned_alloc'`. Replace the four call sites with a small static `test_aligned_malloc` / `test_aligned_free` pair that mirrors the wrapper in `libvmaf/src/mem.c`: `_aligned_malloc` / `_aligned_free` on MSVC + MinGW, `posix_memalign` / `free` elsewhere. Test logic is unchanged. Linux CPU build + test pass locally (meson test passes).
e3bd584 to
1eb7a50
Compare
4 tasks
lusoris
pushed a commit
that referenced
this pull request
May 1, 2026
…-1a Netflix Public dataset row) Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the "Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger to TRIGGERED — the Netflix Public training corpus that gated C1 is now locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB, gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a. Verified the rest of state.md against the 2026-04-29-session merged PR set (#193–#205, #209). Every merged PR was feature / chore / docs / perf with no bug-status delta to record per CLAUDE §12 rule 13: - #193 chore(dnn) T7-12 env override removal — chore. - #194 docs(research) T7-9 NPU digest — research. - #195 feat(mcp) T5-2 embedded scaffold — feature. - #196 feat(vulkan) T7-36 cambi integration — feature. - #197 feat(motion) Netflix b949ceb port — upstream port. - #198 chore(backlog) T7-32 micro-investigations — verify-only. - #199 feat(ai) T6-9 model registry — feature. - #200 feat(hip) T7-10 HIP scaffold — feature. - #201 feat(simd) T7-38 SVE2 ports — feature. - #202 feat(ci) T6-8 parity matrix — feature. - #203 feat(ai) T6-7 FastDVDnet — feature. - #205 docs(audit) T7-4 quarterly audit — explicitly notes "no state.md changes (no upstream commit ruled in/out a fork bug)". - #209 perf(sycl) T7-17 fp64-less device — perf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 1, 2026
…er new test) New `libvmaf/test/simd_bitexact_test.h` centralises the per-test SIMD parity scaffolding that was repeated across `test_psnr_hvs_avx2.c`, `test_psnr_hvs_neon.c`, `test_moment_simd.c`, `test_motion_v2_simd.c`, and `test_ssimulacra2_simd.c`: a `xorshift32` PRNG (six file-local copies pre-PR), a portable POSIX/MinGW/MSVC aligned allocator (added in PR #198 and copy-pasted into each new test), an x86 AVX2 CPUID gate, and `SIMD_BITEXACT_ASSERT_MEMCMP` / `SIMD_BITEXACT_ASSERT_RELATIVE` assertion macros that print the first diverging byte / scalar-vs-simd values on failure. Four representative tests migrate to the harness as proof — net `-106` LOC across the four files. New SIMD parity tests now cost ~20 LOC of test-body code instead of ~50–100 LOC of scaffolding plus body. `test_ssimulacra2_simd.c` is intentionally not migrated in this PR; its `fill_random` FP rounding order is load-bearing for input bit patterns and migrating it risks shifting an existing bit-exact test's inputs. A separate dedup PR with a snapshot rerun under `/cross-backend-diff` can migrate it. Include-order invariant: callers must `#include "test.h"` BEFORE `#include "simd_bitexact_test.h"` because `test.h` lacks a header guard and would redefine the `mu_report` static inline if pulled in twice. Inline comments in each migrated test call this out; `libvmaf/test/AGENTS.md` carries the rebase-sensitive invariant row. All 41 `meson test -C build-cpu` cases pass post-refactor; clang-format + clang-tidy clean on every touched file. See ADR-0221. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
lusoris
pushed a commit
that referenced
this pull request
May 1, 2026
…-1a Netflix Public dataset row) Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the "Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger to TRIGGERED — the Netflix Public training corpus that gated C1 is now locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB, gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a. Verified the rest of state.md against the 2026-04-29-session merged PR set (#193–#205, #209). Every merged PR was feature / chore / docs / perf with no bug-status delta to record per CLAUDE §12 rule 13: - #193 chore(dnn) T7-12 env override removal — chore. - #194 docs(research) T7-9 NPU digest — research. - #195 feat(mcp) T5-2 embedded scaffold — feature. - #196 feat(vulkan) T7-36 cambi integration — feature. - #197 feat(motion) Netflix b949ceb port — upstream port. - #198 chore(backlog) T7-32 micro-investigations — verify-only. - #199 feat(ai) T6-9 model registry — feature. - #200 feat(hip) T7-10 HIP scaffold — feature. - #201 feat(simd) T7-38 SVE2 ports — feature. - #202 feat(ci) T6-8 parity matrix — feature. - #203 feat(ai) T6-7 FastDVDnet — feature. - #205 docs(audit) T7-4 quarterly audit — explicitly notes "no state.md changes (no upstream commit ruled in/out a fork bug)". - #209 perf(sycl) T7-17 fp64-less device — perf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 1, 2026
…er new test) New `libvmaf/test/simd_bitexact_test.h` centralises the per-test SIMD parity scaffolding that was repeated across `test_psnr_hvs_avx2.c`, `test_psnr_hvs_neon.c`, `test_moment_simd.c`, `test_motion_v2_simd.c`, and `test_ssimulacra2_simd.c`: a `xorshift32` PRNG (six file-local copies pre-PR), a portable POSIX/MinGW/MSVC aligned allocator (added in PR #198 and copy-pasted into each new test), an x86 AVX2 CPUID gate, and `SIMD_BITEXACT_ASSERT_MEMCMP` / `SIMD_BITEXACT_ASSERT_RELATIVE` assertion macros that print the first diverging byte / scalar-vs-simd values on failure. Four representative tests migrate to the harness as proof — net `-106` LOC across the four files. New SIMD parity tests now cost ~20 LOC of test-body code instead of ~50–100 LOC of scaffolding plus body. `test_ssimulacra2_simd.c` is intentionally not migrated in this PR; its `fill_random` FP rounding order is load-bearing for input bit patterns and migrating it risks shifting an existing bit-exact test's inputs. A separate dedup PR with a snapshot rerun under `/cross-backend-diff` can migrate it. Include-order invariant: callers must `#include "test.h"` BEFORE `#include "simd_bitexact_test.h"` because `test.h` lacks a header guard and would redefine the `mu_report` static inline if pulled in twice. Inline comments in each migrated test call this out; `libvmaf/test/AGENTS.md` carries the rebase-sensitive invariant row. All 41 `meson test -C build-cpu` cases pass post-refactor; clang-format + clang-tidy clean on every touched file. See ADR-0221. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 1, 2026
…-1a Netflix Public dataset row) Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the "Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger to TRIGGERED — the Netflix Public training corpus that gated C1 is now locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB, gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a. Verified the rest of state.md against the 2026-04-29-session merged PR set (#193–#205, #209). Every merged PR was feature / chore / docs / perf with no bug-status delta to record per CLAUDE §12 rule 13: - #193 chore(dnn) T7-12 env override removal — chore. - #194 docs(research) T7-9 NPU digest — research. - #195 feat(mcp) T5-2 embedded scaffold — feature. - #196 feat(vulkan) T7-36 cambi integration — feature. - #197 feat(motion) Netflix b949ceb port — upstream port. - #198 chore(backlog) T7-32 micro-investigations — verify-only. - #199 feat(ai) T6-9 model registry — feature. - #200 feat(hip) T7-10 HIP scaffold — feature. - #201 feat(simd) T7-38 SVE2 ports — feature. - #202 feat(ci) T6-8 parity matrix — feature. - #203 feat(ai) T6-7 FastDVDnet — feature. - #205 docs(audit) T7-4 quarterly audit — explicitly notes "no state.md changes (no upstream commit ruled in/out a fork bug)". - #209 perf(sycl) T7-17 fp64-less device — perf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 1, 2026
…-1a Netflix Public dataset row) (#245) Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the "Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger to TRIGGERED — the Netflix Public training corpus that gated C1 is now locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB, gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a. Verified the rest of state.md against the 2026-04-29-session merged PR set (#193–#205, #209). Every merged PR was feature / chore / docs / perf with no bug-status delta to record per CLAUDE §12 rule 13: - #193 chore(dnn) T7-12 env override removal — chore. - #194 docs(research) T7-9 NPU digest — research. - #195 feat(mcp) T5-2 embedded scaffold — feature. - #196 feat(vulkan) T7-36 cambi integration — feature. - #197 feat(motion) Netflix b949ceb port — upstream port. - #198 chore(backlog) T7-32 micro-investigations — verify-only. - #199 feat(ai) T6-9 model registry — feature. - #200 feat(hip) T7-10 HIP scaffold — feature. - #201 feat(simd) T7-38 SVE2 ports — feature. - #202 feat(ci) T6-8 parity matrix — feature. - #203 feat(ai) T6-7 FastDVDnet — feature. - #205 docs(audit) T7-4 quarterly audit — explicitly notes "no state.md changes (no upstream commit ruled in/out a fork bug)". - #209 perf(sycl) T7-17 fp64-less device — perf. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 2, 2026
…er new test) New `libvmaf/test/simd_bitexact_test.h` centralises the per-test SIMD parity scaffolding that was repeated across `test_psnr_hvs_avx2.c`, `test_psnr_hvs_neon.c`, `test_moment_simd.c`, `test_motion_v2_simd.c`, and `test_ssimulacra2_simd.c`: a `xorshift32` PRNG (six file-local copies pre-PR), a portable POSIX/MinGW/MSVC aligned allocator (added in PR #198 and copy-pasted into each new test), an x86 AVX2 CPUID gate, and `SIMD_BITEXACT_ASSERT_MEMCMP` / `SIMD_BITEXACT_ASSERT_RELATIVE` assertion macros that print the first diverging byte / scalar-vs-simd values on failure. Four representative tests migrate to the harness as proof — net `-106` LOC across the four files. New SIMD parity tests now cost ~20 LOC of test-body code instead of ~50–100 LOC of scaffolding plus body. `test_ssimulacra2_simd.c` is intentionally not migrated in this PR; its `fill_random` FP rounding order is load-bearing for input bit patterns and migrating it risks shifting an existing bit-exact test's inputs. A separate dedup PR with a snapshot rerun under `/cross-backend-diff` can migrate it. Include-order invariant: callers must `#include "test.h"` BEFORE `#include "simd_bitexact_test.h"` because `test.h` lacks a header guard and would redefine the `mu_report` static inline if pulled in twice. Inline comments in each migrated test call this out; `libvmaf/test/AGENTS.md` carries the rebase-sensitive invariant row. All 41 `meson test -C build-cpu` cases pass post-refactor; clang-format + clang-tidy clean on every touched file. See ADR-0221. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 2, 2026
…er new test) (#252) New `libvmaf/test/simd_bitexact_test.h` centralises the per-test SIMD parity scaffolding that was repeated across `test_psnr_hvs_avx2.c`, `test_psnr_hvs_neon.c`, `test_moment_simd.c`, `test_motion_v2_simd.c`, and `test_ssimulacra2_simd.c`: a `xorshift32` PRNG (six file-local copies pre-PR), a portable POSIX/MinGW/MSVC aligned allocator (added in PR #198 and copy-pasted into each new test), an x86 AVX2 CPUID gate, and `SIMD_BITEXACT_ASSERT_MEMCMP` / `SIMD_BITEXACT_ASSERT_RELATIVE` assertion macros that print the first diverging byte / scalar-vs-simd values on failure. Four representative tests migrate to the harness as proof — net `-106` LOC across the four files. New SIMD parity tests now cost ~20 LOC of test-body code instead of ~50–100 LOC of scaffolding plus body. `test_ssimulacra2_simd.c` is intentionally not migrated in this PR; its `fill_random` FP rounding order is load-bearing for input bit patterns and migrating it risks shifting an existing bit-exact test's inputs. A separate dedup PR with a snapshot rerun under `/cross-backend-diff` can migrate it. Include-order invariant: callers must `#include "test.h"` BEFORE `#include "simd_bitexact_test.h"` because `test.h` lacks a header guard and would redefine the `mu_report` static inline if pulled in twice. Inline comments in each migrated test call this out; `libvmaf/test/AGENTS.md` carries the rebase-sensitive invariant row. All 41 `meson test -C build-cpu` cases pass post-refactor; clang-format + clang-tidy clean on every touched file. See ADR-0221. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three S-effort follow-ups identified by the 2026-04-28 BACKLOG audit, bundled in one PR per the audit's hygiene rule.
motion_v2AVX2srlv_epi64audit. New fork-local libvmaf C unit testlibvmaf/test/test_motion_v2_simd.cexercises four adversarial 16-bit fixtures (uniform-negative diffs at bpc 10 and 12; alternating-mixed-sign at bpc 10 and 12) againstmotion_score_pipeline_16_avx2inlibvmaf/src/feature/x86/motion_v2_avx2.c. The Phase-1 SIMD body uses_mm256_srlv_epi64(logical) where scalar uses arithmetic>>; the test compares the AVX2 SAD against a line-for-line scalar reference duplicated frominteger_motion_v2.c. On the bench host the post-abs()Phase-2 aggregation absorbs the per-lane shift difference and SAD totals match scalar — the test stays as a permanent regression guard. Closes thedocs/rebase-notes.md§0038 follow-up placeholder.tiny-vmaf-v2model identity. Research-0006 §4 referenced a non-existenttiny-vmaf-v2prototype underai/prototypes/. The actual largest shipped tiny-AI MLP isvmaf_tiny_v1_medium.onnx(mlp_medium, landed by PR feat(ai): tiny-AI training prep (loader + eval + Lightning harness for Netflix corpus) #158). The §4 narrative is updated to reference the real checkpoint name; QAT cost/budget framing unchanged.python/vmaf/routine.py:937,1109FIXME verify. Bothcv_on_datasetandexplain_model_on_datasethard-codedfeature_option_dict=Nonewith aFIXMEcomment about inconsistent behaviour withVmafQualityRunner. The FIXME describes a real defect —VmafQualityRunnerreadsfeature_opts_dictsfrom the model dict at predict time;explain_model_on_datasetdid not, so a model carrying per-extractor options would explain itself with mismatched feature configurations. Now:cv_on_datasetreadsfeature_param.feature_optional_dictwhen present (mirroringtrain_test_vmaf_on_datasetat the same file);explain_model_on_datasetreadsmodel.model_dict["feature_opts_dicts"](mirroringVmafQualityRunner). New regression testpython/test/routine_feature_option_dict_test.pycovers bothNoneand populated-dict cases for both routines via aFeatureAssemblermock.Pre-CLAUDE.md §12 r12: no touched-file lint cleanup needed — verify-only sub-tasks.
Deep-dive deliverables (ADR-0108)
docs/rebase-notes.md§ 0038 closed.Test plan
meson test -C build-cpu --no-rebuild— 38/38 OK including newtest_motion_v2_simdpython -m pytest python/test/routine_feature_option_dict_test.py -v --rootdir=python— 4/4 PASSpre-commit run --files CHANGELOG.md docs/rebase-notes.md docs/research/0006-tinyai-ptq-accuracy-targets.md libvmaf/test/meson.build python/vmaf/routine.py libvmaf/test/test_motion_v2_simd.c python/test/routine_feature_option_dict_test.py— every hook PASS (clang-format / black / isort / ruff / copyright)bash scripts/ci/check-copyright.sh— exit 0bash scripts/ci/assertion-density.sh— PASS (every fork-added function ≥20 lines has ≥1 assert)