Conversation
b0f1a1d to
845d862
Compare
845d862 to
126d4b7
Compare
There was a problem hiding this comment.
Pull request overview
Adds a fork-local analysis scaffold to compute per-(source, codec, rc_mode) Pareto frontiers over a large encoder knob sweep, plus the accompanying ADR/research documentation and a regression gate to prevent shipping recipe defaults that lose to bare encoder settings at matched bitrate.
Changes:
- Introduces
ai/scripts/analyze_knob_sweep.pyto load a JSONL sweep, stratify by(source, codec, rc_mode), compute Pareto hulls, and report CSV + markdown summaries, including a regression-detection gate. - Adds unit tests with a synthetic JSONL fixture to exercise stratification, hull selection (incl. encode-time tiebreak), and regression detection.
- Documents the methodology/decision via ADR-0305 + Research-0077, and records the change in
ai/AGENTS.md, rebase notes, and a changelog fragment.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/research/0077-encoder-knob-space-pareto-frontiers.md | New research digest describing sweep axes, per-slice frontier methodology, and scaffolded findings. |
| docs/rebase-notes.md | Adds rebase ledger entry for ADR-0305 / sweep analysis scaffold. |
| docs/adr/README.md | Adds ADR-0305 row to ADR index table (but this file is generated from fragments). |
| docs/adr/0305-encoder-knob-space-pareto-analysis.md | New ADR documenting the stratified Pareto-frontier decision and consequences. |
| changelog.d/added/encoder-knob-space-pareto-analysis.md | Changelog fragment announcing the analysis scaffold and tests. |
| ai/tests/test_knob_sweep_analysis.py | New tests validating hull computation, stratification keys, and regression detection. |
| ai/scripts/analyze_knob_sweep.py | New analysis script: load JSONL, compute per-slice Pareto hulls, write CSV + summary, detect regressions vs bare defaults. |
| ai/AGENTS.md | Adds a “knob-sweep corpus invariant” tying adapter-default eligibility to per-slice hulls + non-regression gating. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | [ADR-0283](0283-vmaf-tune-videotoolbox-adapters.md) | Apple VideoToolbox codec adapters for `tools/vmaf-tune/`. Adds `H264VideoToolboxAdapter` + `HEVCVideoToolboxAdapter` (and a shared `_videotoolbox_common.py` for the `-q:v` 0..100 quality knob + nine-name preset → `-realtime` boolean mapping) along the same one-file-per-codec contract NVENC / AMF / QSV already use. AV1 hardware encoding intentionally omitted (unavailable on Apple Silicon as of 2026). Tests mock `subprocess.run` so Linux CI stays green; macOS end-to-end is left to contributors with VideoToolbox available locally. The originally-coupled 16-slot codec-vocab schema expansion is deferred to a follow-up PR awaiting a fresh `fr_regressor_v2` retrain (ship-gate per ADR-0235 + ADR-0291). Companion research digest [`docs/research/0074-vmaf-tune-videotoolbox-adapters.md`](../research/0074-vmaf-tune-videotoolbox-adapters.md). | Accepted | tooling, ai, ffmpeg, codec, hardware-encoder, apple, fork-local | | ||
| | [ADR-0297](0297-vmaf-tune-sample-clip.md) | `vmaf-tune --sample-clip-seconds N` — opt-in sample-clip mode that encodes/scores only the centre N-second window of each source per grid cell instead of the full reference, scaling per-cell wall time roughly linearly with slice length (e.g. ~6x speedup at `N=10` against a 60-second source). FFmpeg input-side `-ss <start> -t <N>` cuts the rawvideo demuxer at the slice boundary; the libvmaf CLI's `--frame_skip_ref` / `--frame_cnt` mirror the same window on the score side so VMAF compares matching frames without slicing the reference YUV on disk. Centre-anchored placement (naive scaffold; TransNet V2-based smart placement is a follow-up). Each emitted row carries `clip_mode = "sample_<N>s"` or `"full"` so Phase B (target-VMAF bisect) and Phase C (per-title CRF predictor) can filter, weight, or epilogue-rescore. Corpus schema bumps additively to `SCHEMA_VERSION = 2`. Expected accuracy delta ~1–2 VMAF points on diverse content, ~0.3–0.5 on uniform content. Companion to [ADR-0237](0237-quality-aware-encode-automation.md) Phase A. | Accepted | tooling, ffmpeg, vmaf-tune, fork-local | | ||
| | [ADR-0301](0301-vmaf-tune-sample-clip.md) | `vmaf-tune --sample-clip-seconds N` — opt-in sample-clip mode that encodes/scores only the centre N-second window of each source per grid cell instead of the full reference, scaling per-cell wall time roughly linearly with slice length (e.g. ~6x speedup at `N=10` against a 60-second source). FFmpeg input-side `-ss <start> -t <N>` cuts the rawvideo demuxer at the slice boundary; the libvmaf CLI's `--frame_skip_ref` / `--frame_cnt` mirror the same window on the score side so VMAF compares matching frames without slicing the reference YUV on disk. Centre-anchored placement (naive scaffold; TransNet V2-based smart placement is a follow-up). Each emitted row carries `clip_mode = "sample_<N>s"` or `"full"` so Phase B (target-VMAF bisect) and Phase C (per-title CRF predictor) can filter, weight, or epilogue-rescore. Corpus schema bumps additively to `SCHEMA_VERSION = 2`. Expected accuracy delta ~1–2 VMAF points on diverse content, ~0.3–0.5 on uniform content. Companion to [ADR-0237](0237-quality-aware-encode-automation.md) Phase A. | Accepted | tooling, ffmpeg, vmaf-tune, fork-local | | ||
| | [ADR-0305](0305-encoder-knob-space-pareto-analysis.md) | Encoder knob-space Pareto-frontier analysis scaffold. Companion to [Research-0063](../research/0063-encoder-knob-space-cq-vs-vbr-stratification.md) (CQ vs VBR stratification finding). Decision: stratify the dominance hull on `(bitrate_kbps, vmaf_score)` **per `(source, codec, rc_mode)` slice** — never as a single global frontier — because Research-0063 showed a global hull collapses the rate-control flip and produces consensus recipes that regress NVENC h264/hevc by ~4 VMAF at cq=30 against bare encoder defaults. `encode_time_ms` is the tiebreaker on hull-boundary rows. The 12,636-cell sweep (9 sources × 6 codec families × 3 rc_modes × ~78 knob combinations per codec) is generated locally and lives at `runs/phase_a/full_grid/comprehensive.jsonl` (gitignored). New `ai/scripts/analyze_knob_sweep.py` computes per-slice frontiers + emits per-slice CSVs and a markdown summary, plus a regression-detection check (`detect_recipe_regressions`) that gates ship-candidate recipes — recipes that lose VMAF vs the bare encoder at matched bitrate within the same slice MUST NOT ship as `tools/vmaf-tune/codec_adapters/*` defaults. New `ai/AGENTS.md` knob-sweep corpus invariant pins this. Companion research digest: [`docs/research/0077-encoder-knob-space-pareto-frontiers.md`](../research/0077-encoder-knob-space-pareto-frontiers.md). Headline findings populate the Research-0077 §Headline-findings table via a follow-up commit when the sweep completes (~3h ETA). | Proposed | ai, vmaf-tune, research, encoder, pareto, fork-local | |
| @@ -0,0 +1,129 @@ | |||
| # Research-0077 — Encoder knob-space Pareto frontiers (per source × codec × rc_mode) | |||
|
|
|||
| - **Status**: Adopted by [ADR-0305](../adr/0305-encoder-knob-space-pareto-analysis.md) | |||
Comment on lines
+58
to
+59
| listing the hull rows + the dominated rows directly above the | ||
| hull at each bitrate. |
Comment on lines
+50
to
+52
| # for hull-membership purposes. Calibrated against the per-frame VMAF | ||
| # noise floor (~0.1 points) and bitrate quantisation in libavformat | ||
| # muxers (~1 kbps). |
| """One cell of the knob sweep. | ||
|
|
||
| The schema mirrors `tools/vmaf-tune/` Phase A SCHEMA_VERSION = 2 | ||
| (ADR-0301 sample-clip mode + ADR-0297 multi-codec dispatcher), so a |
|
|
||
| - **Status**: Proposed | ||
| - **Date**: 2026-05-05 | ||
| - **Deciders**: lusoris, Claude |
This was referenced May 5, 2026
lusoris
added a commit
that referenced
this pull request
May 5, 2026
…(ADR-0313) (#410) * ci(policy): Required Checks Aggregator — unblock doc/Python-only PRs (ADR-0313) The 23-named-required-check posture (ADR-0037) deadlocks doc/Python-only PRs: the C-build matrix path-filter-skips on their diffs, but branch protection counts a path-filter-skip + a never-ran-at-all as not satisfying the required-check. PR #400 hit this concretely (10/23 succeeded; 13/23 either skipped or never reported; gh pr merge returned "the base branch policy prohibits the merge"). Aggregator is one workflow with no path filter. It polls up to 8 minutes for sibling workflows to register, then verifies each named check on the head SHA reported success/skipped/neutral (or didn't appear at all, which is the documented path-filter rejection semantics). Aggregator becomes the single branch-protection required check; the 23 individual workflows continue to run unchanged. Manual operator step at adoption (after this PR merges): gh api -X PUT "repos/lusoris/vmaf/branches/master/protection/required_status_checks" \ -F 'strict=true' -F 'contexts=["Required Checks Aggregator"]' Unblocks #400, #403, #404, #405, #406, #407 currently stuck on the deadlock. Per user popup direction 2026-05-05. Files: .github/workflows/required-aggregator.yml (new), docs/adr/0313-*.md (new), changelog.d/added/*.md (new), docs/adr/README.md (+1 row), docs/adr/_index_fragments/_order.txt (+1 line + new fragment), docs/rebase-notes.md §0313. * ci: retrigger after PR body cleanup * ci: retrigger after deliverables opt-out polarity fix --------- Co-authored-by: Lusoris <lusoris@pm.me>
126d4b7 to
d49028d
Compare
9 tasks
lusoris
added a commit
that referenced
this pull request
May 5, 2026
…#413) First post-#410 run on PR #400 hit the 8-minute poll deadline while the Ubuntu gcc/clang DNN + Windows MSVC builds were still IN_PROGRESS and were declared FAILURE by the aggregator (their actual conclusion would have been success a few minutes later). The fork's slowest required builds take 12+ minutes; 30-minute poll deadline gives ~2x headroom. Job-level timeout-minutes raised to 35 so the GitHub-side kill doesn't pre-empt the poll. No semantic change to the aggregator logic; same accept-set (success/skipped/neutral, plus "never appeared" = path-filter rejection). Single-line numeric tweak. Co-authored-by: Lusoris <lusoris@pm.me>
Companion to Research-0063 (CQ vs VBR stratification finding, already merged). Ships the methodology + analysis scripts for the 12,636-cell encoder knob sweep (9 sources x 6 codec families x 3 rate-control modes x ~78 knob combinations per codec) that drives tools/vmaf-tune/codec_adapters/* recipe defaults. Headline findings on the populated Pareto frontiers land via a follow-up commit when the sweep completes (~3h ETA). Decision (ADR-0305): stratify the dominance hull on (bitrate_kbps, vmaf_score) per (source, codec, rc_mode) slice -- never as a single global frontier -- because Research-0063 showed a global hull collapses the rate-control flip and produces consensus recipes that regress NVENC h264/hevc by ~4 VMAF at cq=30 against the bare encoder defaults. encode_time_ms is the tiebreaker on hull-boundary rows. Files: - docs/adr/0305-encoder-knob-space-pareto-analysis.md (Proposed) - docs/research/0077-encoder-knob-space-pareto-frontiers.md - ai/scripts/analyze_knob_sweep.py (per-slice Pareto hull + regression-detection check; emits per-slice CSVs + summary.md) - ai/tests/test_knob_sweep_analysis.py (synthetic 20-row JSONL fixture; 3 passing tests) - ai/AGENTS.md -- append knob-sweep corpus invariant - changelog.d/added/encoder-knob-space-pareto-analysis.md - docs/rebase-notes.md section 0305 The comprehensive.jsonl sweep file is generated locally and lives at runs/phase_a/full_grid/ (gitignored -- not shipped in this PR). Verification: - ast.parse on analyze_knob_sweep.py: clean - pytest ai/tests/test_knob_sweep_analysis.py -v: 3 passed - black ai/: clean (120 files) - ruff check ai/: clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d49028d to
400a4d6
Compare
lusoris
pushed a commit
that referenced
this pull request
May 6, 2026
…earch-0080) Runs the Research-0077 / ADR-0305 analysis script (ai/scripts/analyze_knob_sweep.py, ships in PR #400) over the 12,636-cell Phase A sweep at runs/phase_a/full_grid/comprehensive.jsonl and records the populated Pareto-hull populations + recipe-regression count per codec in Research-0080. ADR-0308 commits the fork to a structural-vs-content-dependent threshold for revision policy. Headline findings: - 162 realised slices (every slice has a populated hull). - 1,915 recipe-vs-bare regressions at default tolerances (bitrate_tol_pct=5, vmaf_tol=0.1). - CQP regression rate 6.6 % vs CBR 20.2 % / VBR 18.7 % re-confirms Research-0063 with hard numbers. - Top-15 aggregated bad-recipe cells all reproduce on all 9 corpus sources, clustered around h264_nvenc + bf3 / spatial_aq / full_hq under CBR/VBR plus a smaller hevc_nvenc + spatial_aq cluster. Decision (ADR-0308): a recipe regression is structural iff it reproduces on >=7 of 9 corpus sources within one (codec, rc_mode, recipe, preset, q) cell. Structural regressions are forbidden as tools/vmaf-tune/codec_adapters/* defaults and forbidden as vmaf-tune recommend outputs without explicit override; content-dependent regressions (1-6 sources) are filtered at recommend-time only via the per-slice hull lookup. The detector remains an offline gate (3-hour sweep too expensive for CI); promotion to a CI gate is deferred until a smaller stratified sample reproduces the structural patterns. Per-codec adapter revisions land as separate follow-up PRs for clean bisect signals. Six deep-dive deliverables (CLAUDE §11 / ADR-0108): - Research digest: docs/research/0080-encoder-knob-sweep-findings.md. - Decision matrix: ADR-0308 §Alternatives considered (4 options). - AGENTS.md invariant note: ai/AGENTS.md §Knob-sweep recipe-regression policy (cites ADR-0305 + ADR-0308). - Reproducer: pytest ai/tests/test_knob_sweep_analysis.py -v (script logic, ships in PR #400) + offline analyser run command in docs/rebase-notes.md §0308. - CHANGELOG fragment: changelog.d/changed/encoder-knob-sweep-findings.md. - Rebase note: docs/rebase-notes.md §0308. Constraints honoured: - Did not modify ai/scripts/analyze_knob_sweep.py (uses public API unchanged via a throw-away wrapper for the field-name rename). - Did not modify tools/vmaf-tune/codec_adapters/* (recipe revisions land in follow-up PRs). - Did not commit runs/ artefacts (.gitignore covers them). - Documentation-only; ~452 LOC against the 600-LOC budget. This PR queues behind PR #400 (ADR-0305 / Research-0077 / analysis script). Rebase target: master after PR #400 merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 6, 2026
…earch-0080) Runs the Research-0077 / ADR-0305 analysis script (ai/scripts/analyze_knob_sweep.py, ships in PR #400) over the 12,636-cell Phase A sweep at runs/phase_a/full_grid/comprehensive.jsonl and records the populated Pareto-hull populations + recipe-regression count per codec in Research-0080. ADR-0308 commits the fork to a structural-vs-content-dependent threshold for revision policy. Headline findings: - 162 realised slices (every slice has a populated hull). - 1,915 recipe-vs-bare regressions at default tolerances (bitrate_tol_pct=5, vmaf_tol=0.1). - CQP regression rate 6.6 % vs CBR 20.2 % / VBR 18.7 % re-confirms Research-0063 with hard numbers. - Top-15 aggregated bad-recipe cells all reproduce on all 9 corpus sources, clustered around h264_nvenc + bf3 / spatial_aq / full_hq under CBR/VBR plus a smaller hevc_nvenc + spatial_aq cluster. Decision (ADR-0308): a recipe regression is structural iff it reproduces on >=7 of 9 corpus sources within one (codec, rc_mode, recipe, preset, q) cell. Structural regressions are forbidden as tools/vmaf-tune/codec_adapters/* defaults and forbidden as vmaf-tune recommend outputs without explicit override; content-dependent regressions (1-6 sources) are filtered at recommend-time only via the per-slice hull lookup. The detector remains an offline gate (3-hour sweep too expensive for CI); promotion to a CI gate is deferred until a smaller stratified sample reproduces the structural patterns. Per-codec adapter revisions land as separate follow-up PRs for clean bisect signals. Six deep-dive deliverables (CLAUDE §11 / ADR-0108): - Research digest: docs/research/0080-encoder-knob-sweep-findings.md. - Decision matrix: ADR-0308 §Alternatives considered (4 options). - AGENTS.md invariant note: ai/AGENTS.md §Knob-sweep recipe-regression policy (cites ADR-0305 + ADR-0308). - Reproducer: pytest ai/tests/test_knob_sweep_analysis.py -v (script logic, ships in PR #400) + offline analyser run command in docs/rebase-notes.md §0308. - CHANGELOG fragment: changelog.d/changed/encoder-knob-sweep-findings.md. - Rebase note: docs/rebase-notes.md §0308. Constraints honoured: - Did not modify ai/scripts/analyze_knob_sweep.py (uses public API unchanged via a throw-away wrapper for the field-name rename). - Did not modify tools/vmaf-tune/codec_adapters/* (recipe revisions land in follow-up PRs). - Did not commit runs/ artefacts (.gitignore covers them). - Documentation-only; ~452 LOC against the 600-LOC budget. This PR queues behind PR #400 (ADR-0305 / Research-0077 / analysis script). Rebase target: master after PR #400 merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 6, 2026
…(Research-0080) (#406) * docs(ai): encoder knob-sweep — Pareto hulls + recipe regressions (Research-0080) Runs the Research-0077 / ADR-0305 analysis script (ai/scripts/analyze_knob_sweep.py, ships in PR #400) over the 12,636-cell Phase A sweep at runs/phase_a/full_grid/comprehensive.jsonl and records the populated Pareto-hull populations + recipe-regression count per codec in Research-0080. ADR-0308 commits the fork to a structural-vs-content-dependent threshold for revision policy. Headline findings: - 162 realised slices (every slice has a populated hull). - 1,915 recipe-vs-bare regressions at default tolerances (bitrate_tol_pct=5, vmaf_tol=0.1). - CQP regression rate 6.6 % vs CBR 20.2 % / VBR 18.7 % re-confirms Research-0063 with hard numbers. - Top-15 aggregated bad-recipe cells all reproduce on all 9 corpus sources, clustered around h264_nvenc + bf3 / spatial_aq / full_hq under CBR/VBR plus a smaller hevc_nvenc + spatial_aq cluster. Decision (ADR-0308): a recipe regression is structural iff it reproduces on >=7 of 9 corpus sources within one (codec, rc_mode, recipe, preset, q) cell. Structural regressions are forbidden as tools/vmaf-tune/codec_adapters/* defaults and forbidden as vmaf-tune recommend outputs without explicit override; content-dependent regressions (1-6 sources) are filtered at recommend-time only via the per-slice hull lookup. The detector remains an offline gate (3-hour sweep too expensive for CI); promotion to a CI gate is deferred until a smaller stratified sample reproduces the structural patterns. Per-codec adapter revisions land as separate follow-up PRs for clean bisect signals. Six deep-dive deliverables (CLAUDE §11 / ADR-0108): - Research digest: docs/research/0080-encoder-knob-sweep-findings.md. - Decision matrix: ADR-0308 §Alternatives considered (4 options). - AGENTS.md invariant note: ai/AGENTS.md §Knob-sweep recipe-regression policy (cites ADR-0305 + ADR-0308). - Reproducer: pytest ai/tests/test_knob_sweep_analysis.py -v (script logic, ships in PR #400) + offline analyser run command in docs/rebase-notes.md §0308. - CHANGELOG fragment: changelog.d/changed/encoder-knob-sweep-findings.md. - Rebase note: docs/rebase-notes.md §0308. Constraints honoured: - Did not modify ai/scripts/analyze_knob_sweep.py (uses public API unchanged via a throw-away wrapper for the field-name rename). - Did not modify tools/vmaf-tune/codec_adapters/* (recipe revisions land in follow-up PRs). - Did not commit runs/ artefacts (.gitignore covers them). - Documentation-only; ~452 LOC against the 600-LOC budget. This PR queues behind PR #400 (ADR-0305 / Research-0077 / analysis script). Rebase target: master after PR #400 merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: retrigger after deliverables N-prefix fix --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9 tasks
lusoris
added a commit
that referenced
this pull request
May 6, 2026
The 2026-05-06 merge train shipped 13 ADRs whose implementing PRs landed but Status was never bumped from Proposed to Accepted. Per docs/adr/README.md and ADR-0028, ADRs flip to Accepted once the deliverable lands. The train moved faster than the per-ADR Status edits could keep up; this PR catches up. Flipped: - ADR-0302 (#401, ENCODER_VOCAB v3 schema expansion) - ADR-0303 (#399, fr_regressor_v2 ensemble prod-flip gate) - ADR-0304 (#402, vmaf-tune fast-path Optuna TPE) - ADR-0305 (#400, knob-sweep Pareto analysis scaffold) - ADR-0307 (#404, vmaf-tune ladder default sampler) - ADR-0308 (#406, knob-sweep recipe-regression policy) - ADR-0309 (#405, ensemble retrain harness) - ADR-0311 (#408, libfuzzer harness expansion) - ADR-0313 (#410, CI Required Checks Aggregator) [table-format Status, sed-edited inline] - ADR-0314 (#412, vmaf-tune --score-backend=vulkan) - ADR-0316 (#414, cli_parse long-only-option assertion fix) - ADR-0317 (#415, CI Docker + FFmpeg-SYCL flake fix) - ADR-0319 (#422, ensemble LOSO trainer real impl) Already-Accepted (no change): ADR-0310 (#407), ADR-0312 (#425), ADR-0315 (skeleton, intentionally Proposed), ADR-0321 (#424).
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Companion to Research-0063 (CQ vs VBR stratification finding). This PR ships the analysis methodology + scripts; actual headline findings land via follow-up commit when the sweep completes (~3h ETA).
Summary
tools/vmaf-tune/codec_adapters/*recipe defaults.(bitrate_kbps, vmaf_score)per(source, codec, rc_mode)slice — never as a single global frontier — because Research-0063 showed a global hull collapses the rate-control flip and produces consensus recipes that regress NVENC h264/hevc by ~4 VMAF at cq=30 against the bare encoder defaults.encode_time_msis the tiebreaker on hull-boundary rows.ai/scripts/analyze_knob_sweep.pyconsumescomprehensive.jsonl, computes per-slice Pareto frontiers, emits per-slice CSV tables + a markdown summary, and carries adetect_recipe_regressions(...)check that gates ship-candidate adapter defaults.comprehensive.jsonlsweep file is generated locally and lives underruns/phase_a/full_grid/(gitignored — not shipped in this PR). The headline-findings table in Research-0077 is currently scaffolded with TBD pending sweep completion; a follow-up commit on this branch flips it once the sweep finishes.Six deep-dive deliverables (ADR-0108)
docs/research/0077-encoder-knob-space-pareto-frontiers.md(source, codec, rc_mode)stratified Pareto [chosen] vs global Pareto vs codec-only stratified)ai/AGENTS.md— knob-sweep corpus invariant ("recipes that regress vs the bare encoder at matched bitrate within the same slice MUST NOT ship as adapter defaults")pytest ai/tests/test_knob_sweep_analysis.py -vchangelog.d/added/encoder-knob-space-pareto-analysis.mddocs/rebase-notes.md§0305Test plan
python3 -c "import ast; ast.parse(open('ai/scripts/analyze_knob_sweep.py').read())"— cleanpytest ai/tests/test_knob_sweep_analysis.py -v— 3 passed (test_pareto_frontier_smoke,test_stratification_keys,test_recipe_regression_detection)black --check ai/— clean (120 files)ruff check ai/— cleantools/vmaf-tune/codec_adapters/*(gated ondetect_recipe_regressionsreturning empty for every shipped recipe)Notes
🤖 Generated with Claude Code