Skip to content

Releases: jonathaneoliver/infinite-streaming

v2.0.0

27 May 22:51
38c6c9e

Choose a tag to compare

v2.0.0 — Release notes

Headline: the analytics surface, the dashboard, and the operator
tooling are all rebuilt around a coherent v2 model — three sibling
ClickHouse tables sharing one severity-tagged labels[] vocabulary,
a Vue 3 dashboard, a harness CLI binary covering the full v2 API,
and a set of Claude Code skills for driving the rig and analysing
incidents through prose prompts.

This is a major release: several v1 surfaces are removed. Read
the Breaking changes section before upgrading.


TL;DR

  • Three coherent ClickHouse tablessession_events,
    network_requests, control_events — share one severity-tagged
    labels[] vocabulary that drives every chip, tint, and filter in
    the UI.
  • A new Vue 3 dashboard at /dashboard/v3/... replaces the
    legacy static-HTML pages.
  • A harness CLI binary under tools/harness-cli/ covers the
    full v2 API surface (24 endpoints + snapshot/undo discipline).
  • Six project-level Claude Code skills under .claude/skills/
    (triage, investigate, forensics, fault, shape, finding)
    let operators drive the rig — and run forensic analyses — through
    natural-language prompts.
  • An in-dashboard AI chat panel (#497, #511#515) — ask the
    rig questions in prose, scoped to the session/play you're viewing,
    backed by a provider-agnostic forwarder chat backend.
  • A player ABR characterization framework (#482, #483, #493)
    plus an Automated Testing dashboard page that groups runs and
    drills into per-step detail.
  • A server-behavior control-surface test suite (#518#524)
    that calibrates rate caps, delay, loss, patterns, fault injection,
    transport faults, and transfer timeouts against a live deployment.
  • A baseline rate cap (#480) every new session inherits, with
    the kernel-truth effective_rate_limit_mbps surfaced in the UI.

⚠️ Breaking changes (read before upgrading)

ClickHouse schema

Was Now Migration
Table session_snapshots Renamed to session_events (#472) Update any direct SQL / Grafana panels / harness scripts that referenced session_snapshots.
Table session_events (classifier output, pre-#472) → renamed to session_markers (#472) → dropped (#474) Gone The classification semantic moved onto labels Array(LowCardinality(String)) on the three live tables. Replace SELECT … FROM session_markers WHERE type=X with SELECT … FROM session_events WHERE has(labels, 'severity=X'). Pre-cutover rows age out via the 30-day TTL.
Table control_events (new) Adds the third sibling Additive — no migration required.
labels Array(LowCardinality(String)) column on all three tables New Additive. Format: <severity>=<event>; severities error | critical | warning | info; synthesized labels prefixed * (e.g. *stall_severe_midplay).
Case-sensitivity Every forwarder ingest path now runs player_id / play_id through canonicalV2ID() (lowercases UUIDs). Operator queries against historical rows that hardcoded uppercase UUID filters silently match zero post-cutover. Lowercase your WHERE-clause UUIDs.

HTTP / SSE surface

Was Now Migration
/api/session_markers, /api/v2/session_markers, /api/v2/session_events (the markers alias) Removed Read labels[] off session_events / network_requests rows for the bucket-A signals; read control_events rows for proxy/operator actions.
streams=markers on /api/v2/timeseries Removed → use streams=control Update SSE subscriptions. The SSE event name marker is replaced by control.
GET /api/v2/control_events (new) Reads the new table Additive.
GET /api/control/stream (SSE, new) Live proxy-side action stream Additive. The forwarder subscribes to it for ingest; clients can subscribe directly too.
play_id synthesis by proxy Player-driven only Pre-bug-#4 clients that relied on the proxy minting a play_id from control_revision now see an empty play_id column. iOS 1.x+ already mints client-side. Other clients should mint UUID per play boundary and pass on every request URL + metrics POST.
Legacy v1/v2 dashboard pages (non-v3) Removed in #459 Replace bookmarks with /dashboard/v3/... equivalents (testing, testing-session, session-viewer, sessions, dashboard, grid).
v1 archive read API (the pre-v2 forwarder archive endpoints) Removed (#478 / #496) The dashboard's plays surface now reads exclusively from the v2 archive (/api/v2/...) via TanStack Query. Any external consumer of the old v1 archive endpoints must move to the v2 equivalents listed above.

Tooling

Was Now Migration
harness archive markers subcommand (the archive family is renamed query in v2.0.0) Removedharness query control The control_events table is the closest analog (proxy/operator actions). Player-emitted signals now live as labels[] on session_events.
Forwarder Go package analytics/go-forwarder/eventclass/ Removed Internal to the forwarder; only matters if you forked. Classification logic moved into labels.go as ingest-time label computation.
priority numeric field on the retired session_markers table Gone with the table Use the severity prefix on the row's labels[] instead (error / critical / warning / info).
restart_id (UUID, pre-cutover) on session_events Renamed to attempt_id (UInt32 counter) Player-driven sticky counter, +1 per restart event. Reset to 1 at each play boundary.

Migration checklist

# 1. Apply the schema changes. Fresh deploys pick these up from
#    analytics/clickhouse/init.d/01-schema.sql automatically; for an
#    existing cluster, run each migration explicitly:
make analytics-migrate SQL='ALTER TABLE infinite_streaming.session_events ADD COLUMN labels Array(LowCardinality(String)) DEFAULT [] CODEC(ZSTD(1))'
make analytics-migrate SQL='ALTER TABLE infinite_streaming.network_requests ADD COLUMN labels Array(LowCardinality(String)) DEFAULT [] CODEC(ZSTD(1))'
make analytics-migrate SQL='ALTER TABLE infinite_streaming.session_events ADD COLUMN attempt_id UInt32 DEFAULT 0 CODEC(ZSTD(1))'
make analytics-migrate SQL='ALTER TABLE infinite_streaming.session_events ADD COLUMN last_buffering_time_s Float32 CODEC(ZSTD(1))'
# New tables — paste each CREATE TABLE IF NOT EXISTS block from the
# init.d schema files (re-running one that already exists is a no-op):
#   - control_events, characterization_runs → analytics/clickhouse/init.d/01-schema.sql
#   - llm_calls (AI chat panel)             → analytics/clickhouse/init.d/03-llm-calls.sql
make analytics-migrate SQL='DROP TABLE IF EXISTS infinite_streaming.session_markers'

# 2. Rebuild the forwarder + proxy.
make analytics-rebuild-forwarder
make test-deploy-dev          # or your env's full-deploy target

# 3. Rebuild and re-install the harness CLI.
make harness-cli

# 4. Update bookmarks / dashboards to /dashboard/v3/.

For self-hosted operators with custom Grafana dashboards: search
your dashboard JSON for session_snapshots and session_markers and
swap to session_events + has(labels, 'severity=…') predicates.


What's new

1. API v2 + new Vue 3 dashboard

A from-scratch, OpenAPI-typed v2 API now sits alongside v1, modelled
around plays (one row per playback episode) rather than v1's
(session_id, play_id) tuples.

  • Server: go-proxy/internal/v2/server/ — typed handlers, fault rule
    resource, content / labels / shape PATCH, snapshot+restore.
  • Forwarder archive: /api/v2/snapshots, /api/v2/network_requests,
    /api/v2/control_events, /api/v2/plays, /api/v2/plays/aggregate,
    /api/v2/session_heatmap, /api/v2/session_bundle, plus the
    unified /api/v2/timeseries SSE that multiplexes
    streams=events,network,control over one connection.
  • Spec: api/openapi/v2/{proxy,forwarder}.yaml + Scalar UI mirror at
    /dashboard/api-docs/.

The dashboard at /dashboard/v3/ is the canonical UI. Vue 3 SPA,
TanStack Query, brush-as-source-of-truth chart-coordination model,
Sessions picker for archived plays. Pages: dashboard, testing,
testing-session, session-viewer, sessions, grid.

2. Player identity model (bug #4)

Both play_id and attempt_id are now player-driven:

  • iOS mints both at app launch; rotates play_id on real boundaries
    (content selection / fresh page-load) and attempt_id on every
    restart event.
  • The proxy never synthesises them; the field stays empty when the
    player hasn't sent one yet.
  • attempt_id is a UInt32 sticky counter on every row of all three
    tables.

3. Labels-driven classification

Every row in the three CH tables carries a severity-tagged labels[]
column. Same vocab drives row tint, chip rendering, severity filters,
and the Sessions multi-select.

  • session_events labels: stalls with duration buckets +
    startup/scrub/midplay context, errors, restarts, ABR shifts.
  • network_requests labels: HTTP outcomes (error=http_5xx,
    warning=http_4xx), fault categories (*transport_socket,
    *transport_disconnect, *transfer_*_timeout), per-kind failures,
    slow segments, request_retry (only on real retries, not normal
    manifest polling).
  • control_events labels: operator and proxy actions
    (*fault_rule_enabled, *pattern_enabled_rampUp (per pattern
    name)
    , *fault_on, *pattern_step, *session_start, etc.).

4. control_events table

Brand-new sibling capturing every server-side or operator-driven
action: fault toggles, pattern step advances, shaper edits, harness
PATCHes (label edits, content swap, timeouts), session lifecycle.
Distinguished by source ∈ {harness, proxy, auto}. Replaces the
retired session_markers table.

5. Dashboard UX

Sessions page

  • New Labels column rendering severity-tinted chips
    (count× event_name), so...
Read more

v1.1.0

30 Apr 16:54
d3370cc

Choose a tag to compare

v1.1.0 — first feature release after the open-source GA

The v1.0.0 GA proved the core loop works. v1.1.0 leans hard into diagnostics: every fault is now legible at a glance, every player has a 911 button that captures forensic-quality HAR files, and the device apps were rebuilt around a cinematic UI that doubles as release-quality demo material.

🎯 All-tab fault injection override

A new All tab on the Fault Injection panel applies one rule to every HTTP request kind in a single click — segments, media manifests, master. The Segment / Manifest / Master tabs disable with an "All override active" banner while it's on. Use it for blunt-instrument blast tests; drop back to per-kind tabs when you need different rules per request kind.

🌊 Streaming-aware HAR with !✂ / !⏱ / !↩ glyphs

Every request the proxy serves is captured as a HAR-format network log entry and rendered as a Chrome DevTools-style waterfall in the dashboard. The glyph column tells the truth even when the status code can't:

  • !✂ — proxy deliberately tore down the connection (fault injection: request_body_*, request_first_byte_*, etc.)
  • !⏱ — server transfer timeout fired (active or idle)
  • !↩ — player gave up mid-transfer
  • ! — plain HTTP fault (4xx / 5xx)

Status codes were also fixed to reflect what the client actually saw on the wire, not internal sentinels: 200 for body/header mid-stream cuts, 0 for connect-time aborts, 503 only on hijack fallback. A row that looks like a successful 200 download but shows !↩ next to its method tells you the player abandoned it — exactly the signal that used to disappear into the void.

🚨 One-tap 911 incident capture

Every player platform (iOS, iPadOS, tvOS, Android TV) now has a "911" button right of the Reload action. One tap on a problem fires a user_marked event the server captures as a HAR snapshot — last 10 minutes, every play within the window, written to the new Incidents browser. Cross-layer "911" log lines are grep-friendly across Apple device console, adb logcat, and docker logs, so tracing one user complaint across all three layers is one search. Auto-snapshots also fire on detected stalls / segment-stalls.

📂 Incidents page

New dashboard page that browses every saved HAR snapshot — manual saves, 911 captures, and auto-captured stalls — with reason filters, bulk delete, and click-to-render-waterfall inline. Replays the same rendering machinery as the live testing-session view; no separate viewer to learn.

📊 Playback state chart

The per-session bitrate chart now stacks an events timeline above the bandwidth / buffer / FPS panels, all sharing the same x-axis. Swim-lanes for PLAYER variants (one row per ladder rung), DISPLAY RES, PLAYERSTATE, PLAYBACK / IMPAIRMENT markers, and SERVER LOOP boundaries make a single screenshot tell the whole "what the player was doing while the network was being shaped" story.

🎬 Apple + AndroidTV cinematic UI

The iOS, iPadOS, tvOS, and AndroidTV apps were rebuilt around a shared cinematic dark UI: a "Now Playing" hero, a LIVE row of live preview tiles (real LL-HLS preview frames within a per-device decode budget, not stale stills), and a slide-from-right Settings drawer with sticky picker focus. The Apple side reaches parity with the AndroidTV redesign that landed earlier in the same line of work.

⏱ Server transfer timeouts

Per-session active and idle transfer timeouts the proxy enforces against the player. ATS-style — total wall-clock budget vs gap-since-last-write timer — with per-kind opt-in (Apply To: Segments / Media manifests / Master manifest) so you can timeout segments without falsely tripping tiny manifest fetches. Fires render as !⏱ rows on the network log waterfall.

🕒 Wall-clock offset metric

A new player_metrics_true_offset_s metric: how far behind live the player is, computed server-side from the encoder's PDT at the playhead and the server's own receive clock. Independent of any clock skew between the client device and your laptop, immune to phone vs laptop NTP drift, and immune to whatever offset the player engine is internally applying. Surfaces on the buffer-depth chart's right Y-axis and as the basis for cross-client comparison.

🛠 Encoding & content

  • Source audio is normalized to AAC during transcode regardless of input format, so every variant on the ladder has a uniform audio layer (no more "this segment plays on iOS but not Android" audio-codec surprises).
  • New make test-pattern Makefile target generates a 4K synthetic test pattern as a controlled source — deterministic visuals, no copyrighted material, useful for "is this the player or is this the content?" debugging.

🔁 Reset All Settings

Every device app got a destructive Reset All Settings action at the bottom of Settings → Advanced. Wipes the saved server list, playback history, and Advanced flags, then routes back to the ServerPicker — equivalent to a fresh install but without losing the app itself.

🐛 Fault-decision correctness fixes

Fault injection got a series of correctness fixes around concurrent requests and timing semantics. Frequency now means full cycle length (fault start → next fault start) instead of gap-after-recovery, matching what the slider label implies. The video-and-audio-arrived-in-the-same-millisecond double-fire bug is gone (single decision mutex around the read-modify-write). Default Mode aligns server-side init with the dashboard's visible default so first-use rate-limits behave as expected.

🧰 Other improvements

  • Network-log waterfall: Follow Latest sticky checkbox, alt-wheel inner scroll restored, brush minimum drops to 30 s, 40-row default height, page-aware auto-snap.
  • Apple TV: launcher icon + top shelf brand assets (the app was invisible on the Home Screen before).
  • Default Bitrate Y-Max now offers a 100 Mbps option for high-bitrate experiments.
  • Buffer-depth, bandwidth, and FPS charts' plot areas now share the same right edge so vertical x-axis ticks align across every chart.

Full commit-level changelog: v1.0.0…v1.1.0

Container images: ghcr.io/jonathaneoliver/infinite-streaming:v1.1.0 (also tagged :v1.1 and :v1).

v1.0.0 — Open-source release

24 Apr 15:18
a52551e

Choose a tag to compare

What's Changed

Full Changelog: v0.6.0...v1.0.0

v0.2.0

09 Feb 01:10
efe39fd

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: https://github.com/jonathaneoliver/infinite-streaming/commits/v0.2.0