Skip to content

docs(research): T7-9 — Intel AI-PC NPU/EP applicability digest#194

Merged
lusoris merged 1 commit intomasterfrom
docs/t7-9-intel-ai-pc-digest
Apr 29, 2026
Merged

docs(research): T7-9 — Intel AI-PC NPU/EP applicability digest#194
lusoris merged 1 commit intomasterfrom
docs/t7-9-intel-ai-pc-digest

Conversation

@lusoris
Copy link
Copy Markdown
Owner

@lusoris lusoris commented Apr 29, 2026

Summary

Backlog item T7-9: research digest evaluating whether the tiny-AI surface should add first-class support for Intel AI-PC platforms (Meteor / Lunar / Arrow Lake — NPU + integrated Xe/Xe2 GPU).

Verdict: defer the NPU path until a maintainer has hardware to validate int8 + fp16 accuracy gates against Research-0006's PTQ pipeline. The integrated Xe GPU portion of an AI-PC platform is already reachable today through the existing --tiny-device openvino path (same code path the Arc A380 uses), so the iGPU surface costs the fork zero additional code; only the NPU device type is genuinely new surface and is the part deferred.

Re-evaluation triggers documented in the digest §5: hardware acquisition, explicit user request, or a dedicated ORT NPU EP shipping.

Note: the Intel developer overview URL was unreachable from this session's WebFetch sandbox; the digest is therefore explicitly flagged as a training-context summary plus in-tree fork-doc anchors, not freshly fetched citations. All vendor claims should be re-verified before any code lands on the back of the digest.

Files

  • docs/research/0031-intel-ai-pc-applicability.md (new, 5-section digest per the BACKLOG row spec)
  • docs/research/README.md (index row)
  • docs/ai/inference.md (one-line forward-pointer in the EP matrix so readers find the digest)
  • CHANGELOG.md (lusoris fork entry)

ADR-0108 deep-dive 6-key checklist

  • research digestdocs/research/0031-intel-ai-pc-applicability.md
  • decision matrix in ADR ## Alternatives considered — no ADR needed: doc-only, verdict is defer. The digest's own ## Alternatives explored section carries the equivalent decision matrix.
  • AGENTS.md invariant note — no rebase-sensitive invariants: doc-only, no source code touched.
  • reproducer / smoke-test commandmake format-check && pre-commit run --files docs/research/0031-intel-ai-pc-applicability.md docs/research/README.md docs/ai/inference.md CHANGELOG.md (already passes locally).
  • CHANGELOG.md lusoris-fork entry — under ## [Unreleased] — lusoris fork.
  • docs/rebase-notes.md entry — no rebase impact: doc-only changes under docs/, no upstream-mirrored code touched.

Test plan

  • Pre-commit hooks pass on touched files (trim whitespace, EOF, merge-conflict, mixed line ending, secrets, conventional-commit).
  • No source files modified — Netflix golden gate not exercised, but cannot regress.
  • Reviewer eyeballs the verdict (defer) and confirms the re-evaluation triggers in §5 are reasonable.

Research-0031 evaluates whether the tiny-AI surface should add
first-class NPU support for Intel Meteor / Lunar / Arrow Lake
AI-PC platforms. Verdict: defer the NPU path — no maintainer
hardware available to validate int8 + fp16 accuracy gates
against Research-0006's PTQ pipeline. The integrated Xe / Xe2
GPU portion of an AI-PC platform is already reachable today
through the existing --tiny-device openvino path (same code
path the Arc A380 uses), so the iGPU surface costs zero new
code; only the NPU device type is genuinely new surface and
is the part that's deferred.

Closes backlog T7-9. Doc-only — no C/Python source changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lusoris lusoris merged commit e1244aa into master Apr 29, 2026
49 checks passed
@lusoris lusoris deleted the docs/t7-9-intel-ai-pc-digest branch April 29, 2026 09:46
@github-actions github-actions Bot mentioned this pull request Apr 29, 2026
lusoris pushed a commit that referenced this pull request Apr 29, 2026
…h T7-9

T7-9 (#194, just merged) shipped Research-0031 (Intel AI-PC NPU
applicability digest). This PR's cambi-vulkan-integration digest
was independently numbered 0031 by the agent that drafted it.
Renumber to 0032 to keep the one-number-per-digest invariant.

References updated: filename, in-body title, ADR-0210 cross-link,
ADR-0210 README index row, CHANGELOG.md, docs/rebase-notes.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request Apr 29, 2026
…h T7-9

T7-9 (#194, just merged) shipped Research-0031 (Intel AI-PC NPU
applicability digest). This PR's cambi-vulkan-integration digest
was independently numbered 0031 by the agent that drafted it.
Renumber to 0032 to keep the one-number-per-digest invariant.

References updated: filename, in-body title, ADR-0210 cross-link,
ADR-0210 README index row, CHANGELOG.md, docs/rebase-notes.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request Apr 29, 2026
…h T7-9

T7-9 (#194, just merged) shipped Research-0031 (Intel AI-PC NPU
applicability digest). This PR's cambi-vulkan-integration digest
was independently numbered 0031 by the agent that drafted it.
Renumber to 0032 to keep the one-number-per-digest invariant.

References updated: filename, in-body title, ADR-0210 cross-link,
ADR-0210 README index row, CHANGELOG.md, docs/rebase-notes.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request Apr 29, 2026
* feat(vulkan): T7-36 — cambi Vulkan integration (Strategy II)

Closes the GPU long-tail matrix terminus (per ADR-0192 + ADR-0205).
Replaces the spike scaffold's `init_stub`/`extract_stub`/`close_stub`
triple in `libvmaf/src/feature/vulkan/cambi_vulkan.c` with the full
Vulkan-aware lifecycle. After this PR every registered feature
extractor in the fork has at least one GPU twin (lpips remains via
ORT EPs per ADR-0022).

Strategy II hybrid (per ADR-0205 §Decision):

  - GPU runs the integer phases — preprocess (forward-compatible
    scaffold; v1 wires the CPU bilinear-resize for bit-exactness on
    resolution mismatches), per-pixel derivative, the 7×7 spatial
    mask SAT, 2× decimate, and the separable 3-tap mode filter.
  - Host runs the precision-sensitive sliding-histogram
    `calculate_c_values` + top-K spatial pooling + scale-weighted
    final score on byte-identical readback buffers.
  - Bit-exact w.r.t. CPU by construction (every GPU phase is
    integer arithmetic; host residual runs the unmodified CPU code
    on byte-identical buffers); cross-backend gate runs at
    `places=4` from day one with no per-metric tolerance carve-out.

New shaders + 1 unified TU for the 3 SAT phases:

  - `cambi_preprocess.comp` (new) — per-pixel decimate + bit-shift
    + optional anti-dither, exact-resolution fast path.
  - `cambi_mask_dp.comp` (new) — single TU with `PASS=0/1/2` spec
    const for row-SAT / col-SAT / threshold-compare.
  - Existing `cambi_derivative.comp`, `cambi_filter_mode.comp`,
    `cambi_decimate.comp` shaders wired into the dispatch chain
    unchanged (renamed `min3` → `cambi_min3` / `mode3` →
    `cambi_mode3` to avoid the GLSL precision-overload conflict).

`cambi_internal.h` (new) exposes cambi.c's file-static helpers
(`vmaf_cambi_calculate_c_values`, `vmaf_cambi_get_spatial_mask`,
`vmaf_cambi_decimate`, `vmaf_cambi_filter_mode`,
`vmaf_cambi_spatial_pooling`, `vmaf_cambi_weight_scores_per_scale`,
`vmaf_cambi_get_pixels_in_window`, `vmaf_cambi_preprocessing`,
`vmaf_cambi_default_callbacks`) to the GPU twin via a thin
trampoline block at the bottom of `cambi.c` — no upstream-mirror
function-static code is renamed or moved, keeping Netflix sync
clean. Picked over the buffer-pair refactor ADR-0205 sketched
because the latter would ripple through CPU AVX2 / AVX-512 / NEON
callsites for ~200 LOC of churn (vs the trampoline's <70).

Wires:

  - Registers 5 cambi shaders in `vulkan_shader_sources[]` and
    `cambi_vulkan.c` in `vulkan_sources` in
    `libvmaf/src/vulkan/meson.build`.
  - Registers `vmaf_fex_cambi_vulkan` in
    `feature_extractor_list[]` under `#if HAVE_VULKAN`.
  - Adds a `cambi` row to `scripts/ci/cross_backend_vif_diff.py`'s
    `FEATURE_METRICS` so the cross-backend gate at `places=4` runs
    against the CPU baseline.

Documentation (six deep-dive deliverables per ADR-0108):

  - ADR-0210 (`docs/adr/0210-cambi-vulkan-integration.md`)
  - Research-0031 (`docs/research/0031-cambi-vulkan-integration.md`)
  - `docs/rebase-notes.md` entry 0090
  - `docs/backends/vulkan/overview.md` extractor row
  - `libvmaf/src/feature/AGENTS.md` rebase-sensitive invariant note
    (lock-step CPU residual + cambi_internal.h signature contract)
  - `CHANGELOG.md` Unreleased / lusoris fork entry

Smoke verified: 38/38 meson tests pass on the Vulkan-enabled build
including `test_cambi`, `test_vulkan_smoke`, `test_feature_extractor`.
Pre-commit (clang-format + ruff + ADR-0105 copyright header gate)
clean on every touched file. Closes backlog item T7-36.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docs): renumber Research-0031 cambi → 0032 to avoid collision with T7-9

T7-9 (#194, just merged) shipped Research-0031 (Intel AI-PC NPU
applicability digest). This PR's cambi-vulkan-integration digest
was independently numbered 0031 by the agent that drafted it.
Renumber to 0032 to keep the one-number-per-digest invariant.

References updated: filename, in-body title, ADR-0210 cross-link,
ADR-0210 README index row, CHANGELOG.md, docs/rebase-notes.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(metrics): add cambi Vulkan-backend section

Resolves PR #196 Doc-Substance Gate (ADR-0167) failure.

The cambi feature extractor gained a Vulkan backend in this PR
(T7-36 / ADR-0210), making `feature_extractor.c` a touched
"feature extractor" surface per ADR-0100/0167 — which requires
a matching `docs/metrics/` edit.

Adds a "## GPU support" section to docs/metrics/cambi.md with
the integer-phase / host-residual split summary, the meson flag
recipe, and pointers to ADR-0210 + Research-0032.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 1, 2026
…-1a Netflix Public dataset row)

Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the
"Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger
to TRIGGERED — the Netflix Public training corpus that gated C1 is now
locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB,
gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a.

Verified the rest of state.md against the 2026-04-29-session merged PR
set (#193#205, #209). Every merged PR was feature / chore / docs / perf
with no bug-status delta to record per CLAUDE §12 rule 13:
- #193 chore(dnn) T7-12 env override removal — chore.
- #194 docs(research) T7-9 NPU digest — research.
- #195 feat(mcp) T5-2 embedded scaffold — feature.
- #196 feat(vulkan) T7-36 cambi integration — feature.
- #197 feat(motion) Netflix b949ceb port — upstream port.
- #198 chore(backlog) T7-32 micro-investigations — verify-only.
- #199 feat(ai) T6-9 model registry — feature.
- #200 feat(hip) T7-10 HIP scaffold — feature.
- #201 feat(simd) T7-38 SVE2 ports — feature.
- #202 feat(ci) T6-8 parity matrix — feature.
- #203 feat(ai) T6-7 FastDVDnet — feature.
- #205 docs(audit) T7-4 quarterly audit — explicitly notes "no
  state.md changes (no upstream commit ruled in/out a fork bug)".
- #209 perf(sycl) T7-17 fp64-less device — perf.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 1, 2026
…-1a Netflix Public dataset row)

Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the
"Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger
to TRIGGERED — the Netflix Public training corpus that gated C1 is now
locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB,
gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a.

Verified the rest of state.md against the 2026-04-29-session merged PR
set (#193#205, #209). Every merged PR was feature / chore / docs / perf
with no bug-status delta to record per CLAUDE §12 rule 13:
- #193 chore(dnn) T7-12 env override removal — chore.
- #194 docs(research) T7-9 NPU digest — research.
- #195 feat(mcp) T5-2 embedded scaffold — feature.
- #196 feat(vulkan) T7-36 cambi integration — feature.
- #197 feat(motion) Netflix b949ceb port — upstream port.
- #198 chore(backlog) T7-32 micro-investigations — verify-only.
- #199 feat(ai) T6-9 model registry — feature.
- #200 feat(hip) T7-10 HIP scaffold — feature.
- #201 feat(simd) T7-38 SVE2 ports — feature.
- #202 feat(ci) T6-8 parity matrix — feature.
- #203 feat(ai) T6-7 FastDVDnet — feature.
- #205 docs(audit) T7-4 quarterly audit — explicitly notes "no
  state.md changes (no upstream commit ruled in/out a fork bug)".
- #209 perf(sycl) T7-17 fp64-less device — perf.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 1, 2026
…-1a Netflix Public dataset row)

Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the
"Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger
to TRIGGERED — the Netflix Public training corpus that gated C1 is now
locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB,
gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a.

Verified the rest of state.md against the 2026-04-29-session merged PR
set (#193#205, #209). Every merged PR was feature / chore / docs / perf
with no bug-status delta to record per CLAUDE §12 rule 13:
- #193 chore(dnn) T7-12 env override removal — chore.
- #194 docs(research) T7-9 NPU digest — research.
- #195 feat(mcp) T5-2 embedded scaffold — feature.
- #196 feat(vulkan) T7-36 cambi integration — feature.
- #197 feat(motion) Netflix b949ceb port — upstream port.
- #198 chore(backlog) T7-32 micro-investigations — verify-only.
- #199 feat(ai) T6-9 model registry — feature.
- #200 feat(hip) T7-10 HIP scaffold — feature.
- #201 feat(simd) T7-38 SVE2 ports — feature.
- #202 feat(ci) T6-8 parity matrix — feature.
- #203 feat(ai) T6-7 FastDVDnet — feature.
- #205 docs(audit) T7-4 quarterly audit — explicitly notes "no
  state.md changes (no upstream commit ruled in/out a fork bug)".
- #209 perf(sycl) T7-17 fp64-less device — perf.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 1, 2026
…-1a Netflix Public dataset row) (#245)

Update docs/state.md `_Updated:` stamp to 2026-04-29 and rewrite the
"Tiny-AI C1 baseline `fr_regressor_v1.onnx`" deferral row's reopen-trigger
to TRIGGERED — the Netflix Public training corpus that gated C1 is now
locally available at `.workingdir2/netflix/` (9 ref + 70 dis YUVs, ~37 GB,
gitignored; provided by lawrence 2026-04-27), unblocking BACKLOG T6-1a.

Verified the rest of state.md against the 2026-04-29-session merged PR
set (#193#205, #209). Every merged PR was feature / chore / docs / perf
with no bug-status delta to record per CLAUDE §12 rule 13:
- #193 chore(dnn) T7-12 env override removal — chore.
- #194 docs(research) T7-9 NPU digest — research.
- #195 feat(mcp) T5-2 embedded scaffold — feature.
- #196 feat(vulkan) T7-36 cambi integration — feature.
- #197 feat(motion) Netflix b949ceb port — upstream port.
- #198 chore(backlog) T7-32 micro-investigations — verify-only.
- #199 feat(ai) T6-9 model registry — feature.
- #200 feat(hip) T7-10 HIP scaffold — feature.
- #201 feat(simd) T7-38 SVE2 ports — feature.
- #202 feat(ci) T6-8 parity matrix — feature.
- #203 feat(ai) T6-7 FastDVDnet — feature.
- #205 docs(audit) T7-4 quarterly audit — explicitly notes "no
  state.md changes (no upstream commit ruled in/out a fork bug)".
- #209 perf(sycl) T7-17 fp64-less device — perf.

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant