Wire bisect-model-quality into a nightly CI workflow

## Goal

The `bisect-model-quality` tool landed in commit `4a6b76eb` (`feat(ai):
bisect-model-quality — binary-search a checkpoint timeline`), but it's
only usable locally today. To make it load-bearing we should run it
nightly against our model registry and alert on regressions.

## What's needed

1. **Golden feature cache** — a committed or cached parquet of feature
   vectors + DMOS targets drawn from a frozen subset of NFLX-public /
   LIVE / KonIQ. Must be stable across CI runs; probably lives in
   `testdata/` or `ai/testdata/`.
2. **Nightly workflow** (`.github/workflows/nightly-bisect.yml`):
   - Checkout, install `ai/` package + onnxruntime
   - Run `vmaf-train bisect-model-quality --models model/*.onnx --features <cache> --min-plcc 0.85`
   - On `first_bad_index is not None`, post a comment to a tracking
     issue (not file a new one each time — use a sticky label + edit-
     in-place)
3. **Registry ordering** — the tool assumes monotonic quality on the
   model list. We need a canonical ordering (git log on `model/*.onnx`?
   release tags?) that maps models → timeline indices reproducibly.

## Why this is deferred

- Requires designing the golden feature cache first (item 1 above is
  non-trivial: needs to be reproducible, not too big for git, aligned
  to the FR/NR regressor input shape).
- We don't have a stable model registry cadence yet — tiny-AI model
  releases are still in flux.
- Other CI items (see `.workingdir2/analysis/ci-security-triage.md`)
  are higher prio (P0/P1 vs this being P2).

## Acceptance criteria

- Nightly workflow runs and passes on a known-good model set
  (`first_bad_index is None`).
- Synthetic regression test: introducing a deliberately bad ONNX into
  the set trips the alert.
- Report is readable (`render_table()` output is posted to the tracking
  issue).
- No false positives from stochastic eval (fixed RNG seeds in feature
  cache).

## Out of scope

- Training the cache-generation script itself (separate tiny-AI task).
- Per-commit bisection (would 10x CI cost — nightly is the right cadence).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wire bisect-model-quality into a nightly CI workflow #4

Goal

What's needed

Why this is deferred

Acceptance criteria

Out of scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Wire bisect-model-quality into a nightly CI workflow #4

Description

Goal

What's needed

Why this is deferred

Acceptance criteria

Out of scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions