docs(research): T7-9 — Intel AI-PC NPU/EP applicability digest#194
Merged
docs(research): T7-9 — Intel AI-PC NPU/EP applicability digest#194
Conversation
Research-0031 evaluates whether the tiny-AI surface should add first-class NPU support for Intel Meteor / Lunar / Arrow Lake AI-PC platforms. Verdict: defer the NPU path — no maintainer hardware available to validate int8 + fp16 accuracy gates against Research-0006's PTQ pipeline. The integrated Xe / Xe2 GPU portion of an AI-PC platform is already reachable today through the existing --tiny-device openvino path (same code path the Arc A380 uses), so the iGPU surface costs zero new code; only the NPU device type is genuinely new surface and is the part that's deferred. Closes backlog T7-9. Doc-only — no C/Python source changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
Apr 29, 2026
…h T7-9 T7-9 (#194, just merged) shipped Research-0031 (Intel AI-PC NPU applicability digest). This PR's cambi-vulkan-integration digest was independently numbered 0031 by the agent that drafted it. Renumber to 0032 to keep the one-number-per-digest invariant. References updated: filename, in-body title, ADR-0210 cross-link, ADR-0210 README index row, CHANGELOG.md, docs/rebase-notes.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
Apr 29, 2026
…h T7-9 T7-9 (#194, just merged) shipped Research-0031 (Intel AI-PC NPU applicability digest). This PR's cambi-vulkan-integration digest was independently numbered 0031 by the agent that drafted it. Renumber to 0032 to keep the one-number-per-digest invariant. References updated: filename, in-body title, ADR-0210 cross-link, ADR-0210 README index row, CHANGELOG.md, docs/rebase-notes.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
Apr 29, 2026
…h T7-9 T7-9 (#194, just merged) shipped Research-0031 (Intel AI-PC NPU applicability digest). This PR's cambi-vulkan-integration digest was independently numbered 0031 by the agent that drafted it. Renumber to 0032 to keep the one-number-per-digest invariant. References updated: filename, in-body title, ADR-0210 cross-link, ADR-0210 README index row, CHANGELOG.md, docs/rebase-notes.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
Apr 29, 2026
* feat(vulkan): T7-36 — cambi Vulkan integration (Strategy II)
Closes the GPU long-tail matrix terminus (per ADR-0192 + ADR-0205).
Replaces the spike scaffold's `init_stub`/`extract_stub`/`close_stub`
triple in `libvmaf/src/feature/vulkan/cambi_vulkan.c` with the full
Vulkan-aware lifecycle. After this PR every registered feature
extractor in the fork has at least one GPU twin (lpips remains via
ORT EPs per ADR-0022).
Strategy II hybrid (per ADR-0205 §Decision):
- GPU runs the integer phases — preprocess (forward-compatible
scaffold; v1 wires the CPU bilinear-resize for bit-exactness on
resolution mismatches), per-pixel derivative, the 7×7 spatial
mask SAT, 2× decimate, and the separable 3-tap mode filter.
- Host runs the precision-sensitive sliding-histogram
`calculate_c_values` + top-K spatial pooling + scale-weighted
final score on byte-identical readback buffers.
- Bit-exact w.r.t. CPU by construction (every GPU phase is
integer arithmetic; host residual runs the unmodified CPU code
on byte-identical buffers); cross-backend gate runs at
`places=4` from day one with no per-metric tolerance carve-out.
New shaders + 1 unified TU for the 3 SAT phases:
- `cambi_preprocess.comp` (new) — per-pixel decimate + bit-shift
+ optional anti-dither, exact-resolution fast path.
- `cambi_mask_dp.comp` (new) — single TU with `PASS=0/1/2` spec
const for row-SAT / col-SAT / threshold-compare.
- Existing `cambi_derivative.comp`, `cambi_filter_mode.comp`,
`cambi_decimate.comp` shaders wired into the dispatch chain
unchanged (renamed `min3` → `cambi_min3` / `mode3` →
`cambi_mode3` to avoid the GLSL precision-overload conflict).
`cambi_internal.h` (new) exposes cambi.c's file-static helpers
(`vmaf_cambi_calculate_c_values`, `vmaf_cambi_get_spatial_mask`,
`vmaf_cambi_decimate`, `vmaf_cambi_filter_mode`,
`vmaf_cambi_spatial_pooling`, `vmaf_cambi_weight_scores_per_scale`,
`vmaf_cambi_get_pixels_in_window`, `vmaf_cambi_preprocessing`,
`vmaf_cambi_default_callbacks`) to the GPU twin via a thin
trampoline block at the bottom of `cambi.c` — no upstream-mirror
function-static code is renamed or moved, keeping Netflix sync
clean. Picked over the buffer-pair refactor ADR-0205 sketched
because the latter would ripple through CPU AVX2 / AVX-512 / NEON
callsites for ~200 LOC of churn (vs the trampoline's <70).
Wires:
- Registers 5 cambi shaders in `vulkan_shader_sources[]` and
`cambi_vulkan.c` in `vulkan_sources` in
`libvmaf/src/vulkan/meson.build`.
- Registers `vmaf_fex_cambi_vulkan` in
`feature_extractor_list[]` under `#if HAVE_VULKAN`.
- Adds a `cambi` row to `scripts/ci/cross_backend_vif_diff.py`'s
`FEATURE_METRICS` so the cross-backend gate at `places=4` runs
against the CPU baseline.
Documentation (six deep-dive deliverables per ADR-0108):
- ADR-0210 (`docs/adr/0210-cambi-vulkan-integration.md`)
- Research-0031 (`docs/research/0031-cambi-vulkan-integration.md`)
- `docs/rebase-notes.md` entry 0090
- `docs/backends/vulkan/overview.md` extractor row
- `libvmaf/src/feature/AGENTS.md` rebase-sensitive invariant note
(lock-step CPU residual + cambi_internal.h signature contract)
- `CHANGELOG.md` Unreleased / lusoris fork entry
Smoke verified: 38/38 meson tests pass on the Vulkan-enabled build
including `test_cambi`, `test_vulkan_smoke`, `test_feature_extractor`.
Pre-commit (clang-format + ruff + ADR-0105 copyright header gate)
clean on every touched file. Closes backlog item T7-36.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(docs): renumber Research-0031 cambi → 0032 to avoid collision with T7-9
T7-9 (#194, just merged) shipped Research-0031 (Intel AI-PC NPU
applicability digest). This PR's cambi-vulkan-integration digest
was independently numbered 0031 by the agent that drafted it.
Renumber to 0032 to keep the one-number-per-digest invariant.
References updated: filename, in-body title, ADR-0210 cross-link,
ADR-0210 README index row, CHANGELOG.md, docs/rebase-notes.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(metrics): add cambi Vulkan-backend section
Resolves PR #196 Doc-Substance Gate (ADR-0167) failure.
The cambi feature extractor gained a Vulkan backend in this PR
(T7-36 / ADR-0210), making `feature_extractor.c` a touched
"feature extractor" surface per ADR-0100/0167 — which requires
a matching `docs/metrics/` edit.
Adds a "## GPU support" section to docs/metrics/cambi.md with
the integer-phase / host-residual split summary, the meson flag
recipe, and pointers to ADR-0210 + Research-0032.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closed
14 tasks
lusoris
pushed a commit
that referenced
this pull request
May 1, 2026
…-1a Netflix Public dataset row) Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the "Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger to TRIGGERED — the Netflix Public training corpus that gated C1 is now locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB, gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a. Verified the rest of state.md against the 2026-04-29-session merged PR set (#193–#205, #209). Every merged PR was feature / chore / docs / perf with no bug-status delta to record per CLAUDE §12 rule 13: - #193 chore(dnn) T7-12 env override removal — chore. - #194 docs(research) T7-9 NPU digest — research. - #195 feat(mcp) T5-2 embedded scaffold — feature. - #196 feat(vulkan) T7-36 cambi integration — feature. - #197 feat(motion) Netflix b949ceb port — upstream port. - #198 chore(backlog) T7-32 micro-investigations — verify-only. - #199 feat(ai) T6-9 model registry — feature. - #200 feat(hip) T7-10 HIP scaffold — feature. - #201 feat(simd) T7-38 SVE2 ports — feature. - #202 feat(ci) T6-8 parity matrix — feature. - #203 feat(ai) T6-7 FastDVDnet — feature. - #205 docs(audit) T7-4 quarterly audit — explicitly notes "no state.md changes (no upstream commit ruled in/out a fork bug)". - #209 perf(sycl) T7-17 fp64-less device — perf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merged
14 tasks
lusoris
pushed a commit
that referenced
this pull request
May 1, 2026
…-1a Netflix Public dataset row) Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the "Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger to TRIGGERED — the Netflix Public training corpus that gated C1 is now locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB, gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a. Verified the rest of state.md against the 2026-04-29-session merged PR set (#193–#205, #209). Every merged PR was feature / chore / docs / perf with no bug-status delta to record per CLAUDE §12 rule 13: - #193 chore(dnn) T7-12 env override removal — chore. - #194 docs(research) T7-9 NPU digest — research. - #195 feat(mcp) T5-2 embedded scaffold — feature. - #196 feat(vulkan) T7-36 cambi integration — feature. - #197 feat(motion) Netflix b949ceb port — upstream port. - #198 chore(backlog) T7-32 micro-investigations — verify-only. - #199 feat(ai) T6-9 model registry — feature. - #200 feat(hip) T7-10 HIP scaffold — feature. - #201 feat(simd) T7-38 SVE2 ports — feature. - #202 feat(ci) T6-8 parity matrix — feature. - #203 feat(ai) T6-7 FastDVDnet — feature. - #205 docs(audit) T7-4 quarterly audit — explicitly notes "no state.md changes (no upstream commit ruled in/out a fork bug)". - #209 perf(sycl) T7-17 fp64-less device — perf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 1, 2026
…-1a Netflix Public dataset row) Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the "Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger to TRIGGERED — the Netflix Public training corpus that gated C1 is now locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB, gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a. Verified the rest of state.md against the 2026-04-29-session merged PR set (#193–#205, #209). Every merged PR was feature / chore / docs / perf with no bug-status delta to record per CLAUDE §12 rule 13: - #193 chore(dnn) T7-12 env override removal — chore. - #194 docs(research) T7-9 NPU digest — research. - #195 feat(mcp) T5-2 embedded scaffold — feature. - #196 feat(vulkan) T7-36 cambi integration — feature. - #197 feat(motion) Netflix b949ceb port — upstream port. - #198 chore(backlog) T7-32 micro-investigations — verify-only. - #199 feat(ai) T6-9 model registry — feature. - #200 feat(hip) T7-10 HIP scaffold — feature. - #201 feat(simd) T7-38 SVE2 ports — feature. - #202 feat(ci) T6-8 parity matrix — feature. - #203 feat(ai) T6-7 FastDVDnet — feature. - #205 docs(audit) T7-4 quarterly audit — explicitly notes "no state.md changes (no upstream commit ruled in/out a fork bug)". - #209 perf(sycl) T7-17 fp64-less device — perf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 1, 2026
…-1a Netflix Public dataset row) (#245) Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the "Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger to TRIGGERED — the Netflix Public training corpus that gated C1 is now locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB, gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a. Verified the rest of state.md against the 2026-04-29-session merged PR set (#193–#205, #209). Every merged PR was feature / chore / docs / perf with no bug-status delta to record per CLAUDE §12 rule 13: - #193 chore(dnn) T7-12 env override removal — chore. - #194 docs(research) T7-9 NPU digest — research. - #195 feat(mcp) T5-2 embedded scaffold — feature. - #196 feat(vulkan) T7-36 cambi integration — feature. - #197 feat(motion) Netflix b949ceb port — upstream port. - #198 chore(backlog) T7-32 micro-investigations — verify-only. - #199 feat(ai) T6-9 model registry — feature. - #200 feat(hip) T7-10 HIP scaffold — feature. - #201 feat(simd) T7-38 SVE2 ports — feature. - #202 feat(ci) T6-8 parity matrix — feature. - #203 feat(ai) T6-7 FastDVDnet — feature. - #205 docs(audit) T7-4 quarterly audit — explicitly notes "no state.md changes (no upstream commit ruled in/out a fork bug)". - #209 perf(sycl) T7-17 fp64-less device — perf. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Backlog item T7-9: research digest evaluating whether the tiny-AI surface should add first-class support for Intel AI-PC platforms (Meteor / Lunar / Arrow Lake — NPU + integrated Xe/Xe2 GPU).
Verdict: defer the NPU path until a maintainer has hardware to validate int8 + fp16 accuracy gates against Research-0006's PTQ pipeline. The integrated Xe GPU portion of an AI-PC platform is already reachable today through the existing
--tiny-device openvinopath (same code path the Arc A380 uses), so the iGPU surface costs the fork zero additional code; only the NPU device type is genuinely new surface and is the part deferred.Re-evaluation triggers documented in the digest §5: hardware acquisition, explicit user request, or a dedicated ORT NPU EP shipping.
Note: the Intel developer overview URL was unreachable from this session's WebFetch sandbox; the digest is therefore explicitly flagged as a training-context summary plus in-tree fork-doc anchors, not freshly fetched citations. All vendor claims should be re-verified before any code lands on the back of the digest.
Files
docs/research/0031-intel-ai-pc-applicability.md(new, 5-section digest per the BACKLOG row spec)docs/research/README.md(index row)docs/ai/inference.md(one-line forward-pointer in the EP matrix so readers find the digest)CHANGELOG.md(lusoris fork entry)ADR-0108 deep-dive 6-key checklist
docs/research/0031-intel-ai-pc-applicability.md## Alternatives considered— no ADR needed: doc-only, verdict is defer. The digest's own## Alternatives exploredsection carries the equivalent decision matrix.make format-check && pre-commit run --files docs/research/0031-intel-ai-pc-applicability.md docs/research/README.md docs/ai/inference.md CHANGELOG.md(already passes locally).## [Unreleased] — lusoris fork.docs/rebase-notes.mdentry — no rebase impact: doc-only changes underdocs/, no upstream-mirrored code touched.Test plan
defer) and confirms the re-evaluation triggers in §5 are reasonable.