feat(ai): combined Netflix + KoNViD-1k tiny-AI trainer driver#180
Merged
feat(ai): combined Netflix + KoNViD-1k tiny-AI trainer driver#180
Conversation
e9128b8 to
e26db2e
Compare
aee3b93 to
5883efa
Compare
94fc8ae to
6c1ee5c
Compare
5883efa to
425aced
Compare
7 tasks
6c1ee5c to
ca1e775
Compare
425aced to
bd5785c
Compare
Concatenates NetflixFrameDataset and KoNViDPairDataset into one training matrix, reusing _build_model + _train_loop + export_onnx from ai/train/train.py so the model factory and ONNX layout stay identical to the canonical baselines. Five validation modes via --val-mode: * netflix-source (default) — mirrors ADR-0203 (Tennis hold-out). * konvid-holdout — deterministic 10 % of KoNViD clip keys, whole- clip granularity (no frame leakage). * netflix-source-and-konvid-holdout — union of both. * netflix-only / konvid-only — single-corpus fallbacks. Addresses Research-0023 §5: the FoxBird-class outlier needs a broader content distribution; KoNViD-1k adds 1 200 UGC clips on top of the existing 70 Netflix dis-pairs (~17× clip count). Stacks on PR #178 (KoNViD loader bridge); rebase order is 0073 → 0074. Smoke test (5 cases, no libvmaf required) covers the key splitter, --epochs 0 export path, and missing-data fallbacks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bd5785c to
7783fab
Compare
lusoris
pushed a commit
that referenced
this pull request
Apr 28, 2026
Empirical close of Research-0023 §5's open question on the FoxBird per-fold outlier (LOSO PLCC ≈ 0.93 vs ≥ 0.99 on the other 8 Netflix sources). Canonical combined-trainer run (mlp_small, 30 epochs, val=Tennis + 10% KoNViD-holdout, seed=0) on the union of the Netflix Public 9-source corpus and the 1200-clip KoNViD-1k parquet produces an ONNX whose FoxBird metrics dramatically improve over Netflix-only baselines: * FoxBird PLCC: 0.9936 (vs 0.9632 vmaf_tiny_v1.onnx baseline) — +3.04 percentage points absolute, moving FoxBird from a 0.93- class outlier to a 0.99+-class clip. * FoxBird RMSE: 17.296 → 3.216 (5.4× lower). * No regression on Netflix-native sources: PLCC ≥ 0.998 on 7/9 clips, Tennis (formal val) at 0.9966. Validates PR #178 (KoNViD acquisition) + PR #180 (combined trainer driver) infrastructure end-to-end. Closes Research-0023 §5 unblocker question — KoNViD-1k is sufficient for this failure mode; no need to acquire BVI-DVC or AOM-CTC. Caveats: per-clip numbers are training-fit, not held-out generalisation. Proper validation (LOSO on combined corpus with each Netflix source held out) is the natural follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
Apr 28, 2026
Empirical close of Research-0023 §5's open question on the FoxBird per-fold outlier (LOSO PLCC ≈ 0.93 vs ≥ 0.99 on the other 8 Netflix sources). Canonical combined-trainer run (mlp_small, 30 epochs, val=Tennis + 10% KoNViD-holdout, seed=0) on the union of the Netflix Public 9-source corpus and the 1200-clip KoNViD-1k parquet produces an ONNX whose FoxBird metrics dramatically improve over Netflix-only baselines: * FoxBird PLCC: 0.9936 (vs 0.9632 vmaf_tiny_v1.onnx baseline) — +3.04 percentage points absolute, moving FoxBird from a 0.93- class outlier to a 0.99+-class clip. * FoxBird RMSE: 17.296 → 3.216 (5.4× lower). * No regression on Netflix-native sources: PLCC ≥ 0.998 on 7/9 clips, Tennis (formal val) at 0.9966. Validates PR #178 (KoNViD acquisition) + PR #180 (combined trainer driver) infrastructure end-to-end. Closes Research-0023 §5 unblocker question — KoNViD-1k is sufficient for this failure mode; no need to acquire BVI-DVC or AOM-CTC. Caveats: per-clip numbers are training-fit, not held-out generalisation. Proper validation (LOSO on combined corpus with each Netflix source held out) is the natural follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
Apr 28, 2026
…ombined training (#183) * docs(research): Research-0025 — FoxBird outlier resolved via KoNViD Empirical close of Research-0023 §5's open question on the FoxBird per-fold outlier (LOSO PLCC ≈ 0.93 vs ≥ 0.99 on the other 8 Netflix sources). Canonical combined-trainer run (mlp_small, 30 epochs, val=Tennis + 10% KoNViD-holdout, seed=0) on the union of the Netflix Public 9-source corpus and the 1200-clip KoNViD-1k parquet produces an ONNX whose FoxBird metrics dramatically improve over Netflix-only baselines: * FoxBird PLCC: 0.9936 (vs 0.9632 vmaf_tiny_v1.onnx baseline) — +3.04 percentage points absolute, moving FoxBird from a 0.93- class outlier to a 0.99+-class clip. * FoxBird RMSE: 17.296 → 3.216 (5.4× lower). * No regression on Netflix-native sources: PLCC ≥ 0.998 on 7/9 clips, Tennis (formal val) at 0.9966. Validates PR #178 (KoNViD acquisition) + PR #180 (combined trainer driver) infrastructure end-to-end. Closes Research-0023 §5 unblocker question — KoNViD-1k is sufficient for this failure mode; no need to acquire BVI-DVC or AOM-CTC. Caveats: per-clip numbers are training-fit, not held-out generalisation. Proper validation (LOSO on combined corpus with each Netflix source held out) is the natural follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(research): Research-0025 — add LOSO-on-combined sweep section (held-out validation) The §"Per-clip result" table was training-fit (FoxBird in train set). Adds a new §"LOSO sweep on combined corpus" with the proper held-out 9-fold sweep on the combined corpus (each Netflix source held out for its fold + 90% of KoNViD shared, seed=0). Headline numbers (held-out, not training-fit): * Mean PLCC across 9 folds: **0.9966 ± 0.0038** (vs Research-0023 Netflix-only LOSO: 0.9808 ± 0.0214 — std 5.6× tighter) * FoxBird held-out fold PLCC: **0.9932** (vs Research-0023 mlp_small Netflix-only LOSO ≈ 0.93) * Mean SROCC: 0.9984 ± 0.0014 (vs 0.9848 ± 0.0176) The 5.6× drop in PLCC standard deviation across folds is the most significant finding — adding KoNViD-1k eliminates content-distribution variance, not just the FoxBird outlier specifically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Deep-dive deliverables (ADR-0108)
Test plan
🤖 Generated with Claude Code