Releases: jonathaneoliver/infinite-streaming
v2.0.0
v2.0.0 — Release notes
Headline: the analytics surface, the dashboard, and the operator
tooling are all rebuilt around a coherent v2 model — three sibling
ClickHouse tables sharing one severity-tagged labels[] vocabulary,
a Vue 3 dashboard, a harness CLI binary covering the full v2 API,
and a set of Claude Code skills for driving the rig and analysing
incidents through prose prompts.
This is a major release: several v1 surfaces are removed. Read
the Breaking changes section before upgrading.
TL;DR
- Three coherent ClickHouse tables —
session_events,
network_requests,control_events— share one severity-tagged
labels[]vocabulary that drives every chip, tint, and filter in
the UI. - A new Vue 3 dashboard at
/dashboard/v3/...replaces the
legacy static-HTML pages. - A
harnessCLI binary undertools/harness-cli/covers the
full v2 API surface (24 endpoints + snapshot/undo discipline). - Six project-level Claude Code skills under
.claude/skills/
(triage,investigate,forensics,fault,shape,finding)
let operators drive the rig — and run forensic analyses — through
natural-language prompts. - An in-dashboard AI chat panel (
#497,#511–#515) — ask the
rig questions in prose, scoped to the session/play you're viewing,
backed by a provider-agnostic forwarder chat backend. - A player ABR characterization framework (
#482,#483,#493)
plus an Automated Testing dashboard page that groups runs and
drills into per-step detail. - A server-behavior control-surface test suite (
#518–#524)
that calibrates rate caps, delay, loss, patterns, fault injection,
transport faults, and transfer timeouts against a live deployment. - A baseline rate cap (
#480) every new session inherits, with
the kernel-trutheffective_rate_limit_mbpssurfaced in the UI.
⚠️ Breaking changes (read before upgrading)
ClickHouse schema
| Was | Now | Migration |
|---|---|---|
Table session_snapshots |
Renamed to session_events (#472) |
Update any direct SQL / Grafana panels / harness scripts that referenced session_snapshots. |
Table session_events (classifier output, pre-#472) → renamed to session_markers (#472) → dropped (#474) |
Gone | The classification semantic moved onto labels Array(LowCardinality(String)) on the three live tables. Replace SELECT … FROM session_markers WHERE type=X with SELECT … FROM session_events WHERE has(labels, 'severity=X'). Pre-cutover rows age out via the 30-day TTL. |
Table control_events (new) |
Adds the third sibling | Additive — no migration required. |
labels Array(LowCardinality(String)) column on all three tables |
New | Additive. Format: <severity>=<event>; severities error | critical | warning | info; synthesized labels prefixed * (e.g. *stall_severe_midplay). |
| Case-sensitivity | Every forwarder ingest path now runs player_id / play_id through canonicalV2ID() (lowercases UUIDs). |
Operator queries against historical rows that hardcoded uppercase UUID filters silently match zero post-cutover. Lowercase your WHERE-clause UUIDs. |
HTTP / SSE surface
| Was | Now | Migration |
|---|---|---|
/api/session_markers, /api/v2/session_markers, /api/v2/session_events (the markers alias) |
Removed | Read labels[] off session_events / network_requests rows for the bucket-A signals; read control_events rows for proxy/operator actions. |
streams=markers on /api/v2/timeseries |
Removed → use streams=control |
Update SSE subscriptions. The SSE event name marker is replaced by control. |
GET /api/v2/control_events (new) |
Reads the new table | Additive. |
GET /api/control/stream (SSE, new) |
Live proxy-side action stream | Additive. The forwarder subscribes to it for ingest; clients can subscribe directly too. |
play_id synthesis by proxy |
Player-driven only | Pre-bug-#4 clients that relied on the proxy minting a play_id from control_revision now see an empty play_id column. iOS 1.x+ already mints client-side. Other clients should mint UUID per play boundary and pass on every request URL + metrics POST. |
| Legacy v1/v2 dashboard pages (non-v3) | Removed in #459 | Replace bookmarks with /dashboard/v3/... equivalents (testing, testing-session, session-viewer, sessions, dashboard, grid). |
| v1 archive read API (the pre-v2 forwarder archive endpoints) | Removed (#478 / #496) | The dashboard's plays surface now reads exclusively from the v2 archive (/api/v2/...) via TanStack Query. Any external consumer of the old v1 archive endpoints must move to the v2 equivalents listed above. |
Tooling
| Was | Now | Migration |
|---|---|---|
harness archive markers subcommand (the archive family is renamed query in v2.0.0) |
Removed → harness query control |
The control_events table is the closest analog (proxy/operator actions). Player-emitted signals now live as labels[] on session_events. |
Forwarder Go package analytics/go-forwarder/eventclass/ |
Removed | Internal to the forwarder; only matters if you forked. Classification logic moved into labels.go as ingest-time label computation. |
priority numeric field on the retired session_markers table |
Gone with the table | Use the severity prefix on the row's labels[] instead (error / critical / warning / info). |
restart_id (UUID, pre-cutover) on session_events |
Renamed to attempt_id (UInt32 counter) |
Player-driven sticky counter, +1 per restart event. Reset to 1 at each play boundary. |
Migration checklist
# 1. Apply the schema changes. Fresh deploys pick these up from
# analytics/clickhouse/init.d/01-schema.sql automatically; for an
# existing cluster, run each migration explicitly:
make analytics-migrate SQL='ALTER TABLE infinite_streaming.session_events ADD COLUMN labels Array(LowCardinality(String)) DEFAULT [] CODEC(ZSTD(1))'
make analytics-migrate SQL='ALTER TABLE infinite_streaming.network_requests ADD COLUMN labels Array(LowCardinality(String)) DEFAULT [] CODEC(ZSTD(1))'
make analytics-migrate SQL='ALTER TABLE infinite_streaming.session_events ADD COLUMN attempt_id UInt32 DEFAULT 0 CODEC(ZSTD(1))'
make analytics-migrate SQL='ALTER TABLE infinite_streaming.session_events ADD COLUMN last_buffering_time_s Float32 CODEC(ZSTD(1))'
# New tables — paste each CREATE TABLE IF NOT EXISTS block from the
# init.d schema files (re-running one that already exists is a no-op):
# - control_events, characterization_runs → analytics/clickhouse/init.d/01-schema.sql
# - llm_calls (AI chat panel) → analytics/clickhouse/init.d/03-llm-calls.sql
make analytics-migrate SQL='DROP TABLE IF EXISTS infinite_streaming.session_markers'
# 2. Rebuild the forwarder + proxy.
make analytics-rebuild-forwarder
make test-deploy-dev # or your env's full-deploy target
# 3. Rebuild and re-install the harness CLI.
make harness-cli
# 4. Update bookmarks / dashboards to /dashboard/v3/.For self-hosted operators with custom Grafana dashboards: search
your dashboard JSON for session_snapshots and session_markers and
swap to session_events + has(labels, 'severity=…') predicates.
What's new
1. API v2 + new Vue 3 dashboard
A from-scratch, OpenAPI-typed v2 API now sits alongside v1, modelled
around plays (one row per playback episode) rather than v1's
(session_id, play_id) tuples.
- Server:
go-proxy/internal/v2/server/— typed handlers, fault rule
resource, content / labels / shape PATCH, snapshot+restore. - Forwarder archive:
/api/v2/snapshots,/api/v2/network_requests,
/api/v2/control_events,/api/v2/plays,/api/v2/plays/aggregate,
/api/v2/session_heatmap,/api/v2/session_bundle, plus the
unified/api/v2/timeseriesSSE that multiplexes
streams=events,network,controlover one connection. - Spec:
api/openapi/v2/{proxy,forwarder}.yaml+ Scalar UI mirror at
/dashboard/api-docs/.
The dashboard at /dashboard/v3/ is the canonical UI. Vue 3 SPA,
TanStack Query, brush-as-source-of-truth chart-coordination model,
Sessions picker for archived plays. Pages: dashboard, testing,
testing-session, session-viewer, sessions, grid.
2. Player identity model (bug #4)
Both play_id and attempt_id are now player-driven:
- iOS mints both at app launch; rotates
play_idon real boundaries
(content selection / fresh page-load) andattempt_idon every
restart event. - The proxy never synthesises them; the field stays empty when the
player hasn't sent one yet. attempt_idis a UInt32 sticky counter on every row of all three
tables.
3. Labels-driven classification
Every row in the three CH tables carries a severity-tagged labels[]
column. Same vocab drives row tint, chip rendering, severity filters,
and the Sessions multi-select.
- session_events labels: stalls with duration buckets +
startup/scrub/midplay context, errors, restarts, ABR shifts. - network_requests labels: HTTP outcomes (
error=http_5xx,
warning=http_4xx), fault categories (*transport_socket,
*transport_disconnect,*transfer_*_timeout), per-kind failures,
slow segments, request_retry (only on real retries, not normal
manifest polling). - control_events labels: operator and proxy actions
(*fault_rule_enabled,*pattern_enabled_rampUp(per pattern
name),*fault_on,*pattern_step,*session_start, etc.).
4. control_events table
Brand-new sibling capturing every server-side or operator-driven
action: fault toggles, pattern step advances, shaper edits, harness
PATCHes (label edits, content swap, timeouts), session lifecycle.
Distinguished by source ∈ {harness, proxy, auto}. Replaces the
retired session_markers table.
5. Dashboard UX
Sessions page
- New Labels column rendering severity-tinted chips
(count× event_name), so...
v1.1.0
v1.1.0 — first feature release after the open-source GA
The v1.0.0 GA proved the core loop works. v1.1.0 leans hard into diagnostics: every fault is now legible at a glance, every player has a 911 button that captures forensic-quality HAR files, and the device apps were rebuilt around a cinematic UI that doubles as release-quality demo material.
🎯 All-tab fault injection override
A new All tab on the Fault Injection panel applies one rule to every HTTP request kind in a single click — segments, media manifests, master. The Segment / Manifest / Master tabs disable with an "All override active" banner while it's on. Use it for blunt-instrument blast tests; drop back to per-kind tabs when you need different rules per request kind.
🌊 Streaming-aware HAR with !✂ / !⏱ / !↩ glyphs
Every request the proxy serves is captured as a HAR-format network log entry and rendered as a Chrome DevTools-style waterfall in the dashboard. The glyph column tells the truth even when the status code can't:
!✂— proxy deliberately tore down the connection (fault injection:request_body_*,request_first_byte_*, etc.)!⏱— server transfer timeout fired (active or idle)!↩— player gave up mid-transfer!— plain HTTP fault (4xx / 5xx)
Status codes were also fixed to reflect what the client actually saw on the wire, not internal sentinels: 200 for body/header mid-stream cuts, 0 for connect-time aborts, 503 only on hijack fallback. A row that looks like a successful 200 download but shows !↩ next to its method tells you the player abandoned it — exactly the signal that used to disappear into the void.
🚨 One-tap 911 incident capture
Every player platform (iOS, iPadOS, tvOS, Android TV) now has a "911" button right of the Reload action. One tap on a problem fires a user_marked event the server captures as a HAR snapshot — last 10 minutes, every play within the window, written to the new Incidents browser. Cross-layer "911" log lines are grep-friendly across Apple device console, adb logcat, and docker logs, so tracing one user complaint across all three layers is one search. Auto-snapshots also fire on detected stalls / segment-stalls.
📂 Incidents page
New dashboard page that browses every saved HAR snapshot — manual saves, 911 captures, and auto-captured stalls — with reason filters, bulk delete, and click-to-render-waterfall inline. Replays the same rendering machinery as the live testing-session view; no separate viewer to learn.
📊 Playback state chart
The per-session bitrate chart now stacks an events timeline above the bandwidth / buffer / FPS panels, all sharing the same x-axis. Swim-lanes for PLAYER variants (one row per ladder rung), DISPLAY RES, PLAYERSTATE, PLAYBACK / IMPAIRMENT markers, and SERVER LOOP boundaries make a single screenshot tell the whole "what the player was doing while the network was being shaped" story.
🎬 Apple + AndroidTV cinematic UI
The iOS, iPadOS, tvOS, and AndroidTV apps were rebuilt around a shared cinematic dark UI: a "Now Playing" hero, a LIVE row of live preview tiles (real LL-HLS preview frames within a per-device decode budget, not stale stills), and a slide-from-right Settings drawer with sticky picker focus. The Apple side reaches parity with the AndroidTV redesign that landed earlier in the same line of work.
⏱ Server transfer timeouts
Per-session active and idle transfer timeouts the proxy enforces against the player. ATS-style — total wall-clock budget vs gap-since-last-write timer — with per-kind opt-in (Apply To: Segments / Media manifests / Master manifest) so you can timeout segments without falsely tripping tiny manifest fetches. Fires render as !⏱ rows on the network log waterfall.
🕒 Wall-clock offset metric
A new player_metrics_true_offset_s metric: how far behind live the player is, computed server-side from the encoder's PDT at the playhead and the server's own receive clock. Independent of any clock skew between the client device and your laptop, immune to phone vs laptop NTP drift, and immune to whatever offset the player engine is internally applying. Surfaces on the buffer-depth chart's right Y-axis and as the basis for cross-client comparison.
🛠 Encoding & content
- Source audio is normalized to AAC during transcode regardless of input format, so every variant on the ladder has a uniform audio layer (no more "this segment plays on iOS but not Android" audio-codec surprises).
- New
make test-patternMakefile target generates a 4K synthetic test pattern as a controlled source — deterministic visuals, no copyrighted material, useful for "is this the player or is this the content?" debugging.
🔁 Reset All Settings
Every device app got a destructive Reset All Settings action at the bottom of Settings → Advanced. Wipes the saved server list, playback history, and Advanced flags, then routes back to the ServerPicker — equivalent to a fresh install but without losing the app itself.
🐛 Fault-decision correctness fixes
Fault injection got a series of correctness fixes around concurrent requests and timing semantics. Frequency now means full cycle length (fault start → next fault start) instead of gap-after-recovery, matching what the slider label implies. The video-and-audio-arrived-in-the-same-millisecond double-fire bug is gone (single decision mutex around the read-modify-write). Default Mode aligns server-side init with the dashboard's visible default so first-use rate-limits behave as expected.
🧰 Other improvements
- Network-log waterfall: Follow Latest sticky checkbox, alt-wheel inner scroll restored, brush minimum drops to 30 s, 40-row default height, page-aware auto-snap.
- Apple TV: launcher icon + top shelf brand assets (the app was invisible on the Home Screen before).
- Default Bitrate Y-Max now offers a 100 Mbps option for high-bitrate experiments.
- Buffer-depth, bandwidth, and FPS charts' plot areas now share the same right edge so vertical x-axis ticks align across every chart.
Full commit-level changelog: v1.0.0…v1.1.0
Container images: ghcr.io/jonathaneoliver/infinite-streaming:v1.1.0 (also tagged :v1.1 and :v1).
v1.0.0 — Open-source release
What's Changed
- fix: shaping reliability — metrics endpoint, pattern start, apply-on-click by @jonathaneoliver in #73
- docs: add docker run quickstart and run-image Makefile target by @jonathaneoliver in #74
- Bundle go-proxy into main container image by @jonathaneoliver in #76
- feat: cloud-based encoding via AWS EC2 spot instances by @jonathaneoliver in #78
- docs: audit and fill gaps (closes #79) by @jonathaneoliver in #80
- feat(dashboard): visualization improvements — PiP, events swim lane, compact UI by @jonathaneoliver in #82
- fix(go-proxy): remove memcache, fix concurrent map crashes, add per-session SSE by @jonathaneoliver in #88
- perf(go-proxy): regex hoisting, atomic reads, SSE pre-marshal, active_sessions for grouping by @jonathaneoliver in #93
- fix: grouped session shaping propagation by @jonathaneoliver in #100
- feat(apple): metrics POST endpoint, SSE player_id filter, ubuntu server by @jonathaneoliver in #102
- perf(go-live): singleton generator, single-lock DASH cache by @jonathaneoliver in #103
- feat(android): ExoPlayer test app for ABR characterization by @jonathaneoliver in #105
- feat(apple): scrollable content picker sheet by @jonathaneoliver in #106
- infra: add go.mod to go-proxy and go-live, enable local go test by @jonathaneoliver in #107
- fix(go-live): cap live window at loop wrap (closes #109) by @jonathaneoliver in #114
- feat(android): TV-friendly UI, real bandwidth metric, gradle wrapper (closes #115) by @jonathaneoliver in #116
- feat(go-proxy): broadcast significant player events + log abandoned transfers (closes #117) by @jonathaneoliver in #118
- refactor(go-proxy): per-session save for group link/unlink + auto-ungroup singles (closes #119) by @jonathaneoliver in #120
- fix(go-proxy): shouldApplyFailure returns false on empty entries (closes #121) by @jonathaneoliver in #122
- feat(apple): local HTTP forward proxy for wire-level metrics (closes #123) by @jonathaneoliver in #124
- chore(apple): drop redundant ISO timestamp from TestingSession stdout log (closes #125) by @jonathaneoliver in #126
- feat(dashboard): compare mode, chart pause, loop swim-lane, wire bitrate, throughput sync (closes #127) by @jonathaneoliver in #128
- chore: gitignore .claude/ (closes #129) by @jonathaneoliver in #130
- fix(apple): restore ContentView compact layout, Go Live toggle, IPv4 host (closes #131) by @jonathaneoliver in #132
- chore(dashboard): rename 'Player Wire Bitrate' to 'Player Network Rate' (closes #133) by @jonathaneoliver in #134
- fix(apple): silence LocalHTTPProxy ECANCELED log flood (closes #135) by @jonathaneoliver in #136
- fix(apple): pass through Content-Length to silence URLAsset err=-12174 (closes #137) by @jonathaneoliver in #138
- feat(dashboard): show Player Network Rate per session in compare-mode charts (closes #139) by @jonathaneoliver in #140
- fix(apple): 'Allow 4K' OFF actually caps at 1080p (closes #141) by @jonathaneoliver in #142
- revert(apple): restore 'Allow 4K' OFF = .zero (closes #143) by @jonathaneoliver in #144
- fix(go-live): emit EXT-X-DISCONTINUITY-SEQUENCE, enlarge live window by @jonathaneoliver in #150
- feat(go-live): declare SERVER-CONTROL:HOLD-BACK on range HLS playlists by @jonathaneoliver in #152
- feat(go-live): inject EXT-X-START on master; go-proxy replaces on liveOffset by @jonathaneoliver in #154
- chore(go-live): remove dead legacy spawn/continuous/once/LoadByteranges by @jonathaneoliver in #156
- feat(apple): track per-segment identity via LocalHTTPProxy by @jonathaneoliver in #158
- feat(apple): enrich PlaybackDiagnostics with unified snapshot by @jonathaneoliver in #160
- feat(apple): preserve live offset across stall recovery by @jonathaneoliver in #162
- feat(dashboard): buffer depth chart y-axis rounds to 5-second steps by @jonathaneoliver in #164
- fix(dashboard): include all grouped sessions when sizing buffer chart by @jonathaneoliver in #166
- feat(metrics): rework player bitrates as avgNetworkBitrate + networkBitrate by @jonathaneoliver in #167
- chore: bump version to v0.9.0 by @jonathaneoliver in #169
- feat(androidtv): iOS parity — UI, content fetch, player metrics by @jonathaneoliver in #171
- feat(dashboard): cross-session legend hover highlight in compare mode by @jonathaneoliver in #173
- feat(apple): tvOS options row, D-pad nav, local proxy, sleep prevention, deploy targets by @jonathaneoliver in #179
- feat(android): fullscreen toggle, layout improvements, app icon banner by @jonathaneoliver in #180
- feat(dashboard): tag all sessions with (SX) in compare view, update nav logo by @jonathaneoliver in #181
- chore: open-source readiness cleanup by @jonathaneoliver in #184
- chore: tag Quartet and Live Offset as Alpha by @jonathaneoliver in #186
- chore: dashboard cleanup batch by @jonathaneoliver in #188
- chore: bump VERSION to v1.0.0 by @jonathaneoliver in #190
- docs(readme): rate shaping as top-level differentiator by @jonathaneoliver in #192
- docs(readme): broaden audience line by @jonathaneoliver in #194
- docs(readme): tvOS in bundled clients + role clarification by @jonathaneoliver in #196
- feat(encoder): prefer 2-channel audio when source has multiple tracks by @jonathaneoliver in #198
- feat(upload): clean filenames + show on-disk name in Source Library by @jonathaneoliver in #200
- chore: scrub personal IPs/hostnames; add QR pairing in Server Info by @jonathaneoliver in #203
- docs(rendezvous): note that fork builds should change the baked-in Worker URL by @jonathaneoliver in #205
- fix(k3s): distinct server_id per deployment so dev + release both appear in discovery by @jonathaneoliver in #206
- fix(ios,tvos): ATS exception for the public HTTP deployment + cleartext docs by @jonathaneoliver in #207
- ci: release automation — semver GHCR tags, Release Drafter, version-tagged k3s image by @jonathaneoliver in #208
Full Changelog: v0.6.0...v1.0.0
v0.2.0
What's Changed
- Add transport fault packet-mode controls and counters by @jonathaneoliver in #3
- Issue 2 socket fault lifecycle variants by @jonathaneoliver in #4
- Add dev k3s deployment and testing ports by @jonathaneoliver in #8
- Codex/grouping sync fixes by @jonathaneoliver in #9
- Add session grouping for synchronized player testing by @Copilot in #7
- Fix network throttle reporting by @jonathaneoliver in #13
- Mirror testing-session UI behavior by @jonathaneoliver in #15
New Contributors
- @jonathaneoliver made their first contribution in #3
- @Copilot made their first contribution in #7
Full Changelog: https://github.com/jonathaneoliver/infinite-streaming/commits/v0.2.0