Skip to content

Implement web-based kernel observatory#2

Merged
jserv merged 4 commits intomainfrom
telemetry
Mar 19, 2026
Merged

Implement web-based kernel observatory#2
jserv merged 4 commits intomainfrom
telemetry

Conversation

@jserv
Copy link
Contributor

@jserv jserv commented Mar 19, 2026

A browser-based dashboard that exposes LKL's internal kernel state in real time while the guest executes. This is not a CLI-to-GUI wrapper -- it reveals runtime details that are invisible in normal Linux operation because they occur inside kernel code paths that userspace cannot observe without instrumentation. The web UI makes kbox's unique architectural property (the kernel runs in the same address space as the supervisor, with all data structures directly readable) accessible to anyone with a browser, not just GDB experts.

Motivation and Positioning

kbox is not "another Linux emulator" or "another sandbox." Emulators (QEMU, Bochs) and sandboxes (gVisor, Firecracker) optimize for isolation and performance. kbox optimizes for transparency: it boots a real Linux kernel as a library specifically so that the kernel's internal machinery -- scheduler decisions, page cache behavior, interrupt routing, VFS traversals, memory allocator state -- can be observed, measured, and understood by students, developers, and anyone curious about how Linux actually works.

The GDB helpers already expose this data to expert users. The web observatory democratizes it. A student who has never used GDB can watch context switches accumulate in real time, see which syscall paths trigger page faults, observe the EEVDF scheduler picking the next task by deadline, and trace a write() from the seccomp notification through VFS down to the block layer -- all rendered as live charts and event streams in a browser tab.

Traditional approaches to kernel observation are impractical for this use case:

  • ftrace/perf require root or CAP_SYS_ADMIN and produce text logs that must be post-processed.
  • /proc and /sys expose counters but not event-level detail (you see the count of context switches, not when each one happened or why).
  • KGDB requires a serial connection and halts the kernel at breakpoints.
  • printk-based tracing requires kernel recompilation and produces unstructured text.

LKL eliminates all of these barriers. The kernel runs in-process, so its /proc and /sys filesystems are accessible to the supervisor via standard kbox_lkl_openat/kbox_lkl_read calls -- no privilege escalation, no ptrace, no external tools. The supervisor can sample kernel state at arbitrary frequency via /proc parsing, correlate it with seccomp dispatch events, and stream it to a browser via SSE (Server-Sent Events) -- all from an unprivileged process.

Design decision -- self-contained telemetry, not infrastructure integration: gVisor exposes Prometheus endpoints and could integrate with OpenTelemetry for distributed tracing. kbox deliberately rejects this approach. Prometheus/OTel add external dependencies (scraping infrastructure, collector daemons, time-series databases), contradict the zero-dependency philosophy, and solve a problem kbox does not have (multi-service distributed tracing across network boundaries). kbox is a single-process, local-only tool. The embedded HTTP server + SSE + browser dashboard is the entire observability stack. No Prometheus, no Grafana, no Jaeger, no npm. A standard browser is the only consumer. If users need to export data for offline analysis, --trace-format=json to stdout or the dashboard's CSV/JSON export covers that. The --trace-format=json schema is intentionally simple but does not preclude downstream ingestion by OTel collectors or jq pipelines -- format compatibility is free, infrastructure dependency is not. This is a conscious trade-off: less enterprise integration surface in exchange for zero operational complexity.

@jserv jserv changed the title Telemetry Web-Based Kernel Observatory Mar 19, 2026
@jserv jserv changed the title Web-Based Kernel Observatory Implement web-based kernel observatory Mar 19, 2026
cubic-dev-ai[bot]

This comment was marked as resolved.

jserv added 4 commits March 19, 2026 19:01
This implements telemetry infrastructure and embedded HTTP server for
the web observatory, activated by --web[=PORT] and conditional on
KBOX_HAS_WEB.

Telemetry sampler (web-telemetry.c):
- Two-tier timer reads LKL-internal /proc files
- Fast tick (100ms): /proc/stat, meminfo, vmstat, loadavg
- Per-tick time budget (5ms) prevents starving dispatch
- ENOSYS JSON serializer clamps snprintf to buffer bounds

Event ring buffer (web-events.c):
- Sequence-numbered to prevent SSE duplicate delivery
- JSON escaping for guest-controlled strings

Embedded HTTP server (web-server.c):
- Minimal HTTP/1.1 via epoll in a dedicated pthread
- SSE on GET /api/events (text/event-stream)
- GET /api/snapshot, /stats, /api/enosys, POST /api/control
- atomic_int for cross-thread state flags
- goto-fail cleanup: all error paths destroy mutex and fds
- epoll_ctl registration checked

Dispatch instrumentation (seccomp-supervisor.c):
- Per-syscall latency, disposition, ENOSYS tracking
- RECV/SEND ENOENT counters (EBADF excluded)

Build: KBOX_HAS_WEB=1 conditional, zero impact when off
Usage: --web[=PORT], --web-bind, --trace-format=json

Change-Id: Ie660bb39fa604e5d301578ed24d3d290946be65d
This replaces the placeholder HTML with a Chart.js dashboard served as
compiled-in static assets via 'xxd -i'.

Web frontend:
- Chart.js 4.4.7 vendored, compiled into binary at build time
- Syscall family stacked chart, memory area, scheduler line, softirq bar
- SVG arc gauges for syscalls/s, ctx switches/s, memory, FDs
- SSE event feed: pure DOM construction (no innerHTML XSS)
- Dark/light theme with localStorage persistence
- 3s polling, rate computation clamped to zero on resets
- Chart label/data alignment guarded by dt > 0 check

Build system:
- scripts/gen-web-assets.sh generates src/web-assets.c via xxd
- sed filter strips xxd declarations for cross-platform compat
- Makefile web-assets target with proper file dependencies

Web backend:
- web-telemetry.c: add softirqs[10] array to JSON snapshot
- web-server.c: compiled-in asset serving with content-type detection,
  write_all() for reliable delivery, /index.html alias

Change-Id: I51139ce900f95031c7c798ca6c6498eba3c9d278
Historical data and export for the web observatory.

Web backend:
- GET /api/history returns snap_ring[] as JSON array
- Oldest-first ordering for chart backfill on page load
- Bounded by WEB_RESP_BUF_SIZE to prevent oversized response

Web frontend:
- loadHistory() fetches on page load, polls start after completion to
  prevent prevSnap overwrite race
- CSV export: snapshot telemetry as timestamped rows
- JSON export: event feed for offline analysis
- Pause handler: state flips only after res.ok confirmation

Change-Id: I48b7e8a387fd01db565e55c7e6bfea5494100b6c
This adds "Why kbox" section comparing against chroot, proot, UML, and
gVisor. Expand architecture section with syscall routing details, ABI
translation, and subsystem internals. Rewrite web observatory section to
explain in-process kernel observability. Document all API endpoints and
implementation details.

Change-Id: I19cd90f3bd4d4f6cd16c3f7acb0c4e1c721f5474
@jserv jserv merged commit 9ed995f into main Mar 19, 2026
3 checks passed
@jserv jserv deleted the telemetry branch March 19, 2026 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant