GPT Realtime Interview Drill Coach

Browser-based voice drill app that trains fast staff/system-design interview reflexes. Implementation of the spec in LOCAL.md (gitignored).

The voice agent runs on OpenAI GPT Realtime over WebRTC. The backend owns the curriculum, rotation, attempts, grading, and weakness state — the model is the voice/interview surface, not the brain (LOCAL.md §18).

For onboarding (architecture, recipes, the test pyramid in detail), see CONTRIBUTING.md. For the build history grouped by LOCAL.md milestone, see CHANGELOG.md.

LOCAL.md spec coverage

LOCAL.md section	Status
§1 Goal · §2 Architecture	✓ (SQLite swap for Postgres in MVP — schema is portable)
§3 Realtime WebRTC (Option A ephemeral token)	✓
§4 Session config (gpt-realtime-2, reasoning, voice, tools, prompt id)	✓
§5 Core product flow	✓
§6 Tool/function interface (`get_next_drill`, `submit_answer_transcript`, `grade_attempt`, `save_generated_cards`, `get_user_skill_summary`, `end_session_summary`)	✓ all 6 wired + dispatched + smoke-verified
§7 Data model	✓
§8 Rotation engine	✓ with separate `mock_interview` formula
§9 Question generation (Layer 1 YAML, Layer 2 templates, Layer 3 LLM drafts + activation flow)	✓
§10 Grading (rubric-first JSON, LLM + offline)	✓
§11 Voice session behavior	✓
§12 Backend endpoints	✓ (see API table)
§13 Frontend screens (MVP + admin: drill browse, rubric editor, test-grade)	✓
§14 Prompt skeleton	✓ (`seeds/realtime-prompt.md`)
§15 MVP 1 (50–100 drills, rotation, transcripts, grading, history)	✓ 55 active drills, 17 topics
§15 MVP 2 (cards, weak dashboard, templates, Anki CSV, rubric editor)	✓
§15 MVP 3 (spaced repetition, skill graph, mock interview, pressure mode, compare attempts)	✓
§16 Seed format · §17 Engineering decisions · §18 Non-negotiable	✓ (autonomy verified by `smoke:realtime:loop`)

Explicitly out of MVP scope per LOCAL.md §15: payments, mobile app, Anki sync, calendar scheduling, multi-user admin. Documented choice: SQLite instead of Postgres for MVP — schema is portable, migration file is the swap point. See docs/POSTGRES_MIGRATION.md for the exact swap path (docker-compose.yml + migrations/postgres.sql).

Layout

apps/
  backend/          Express + TypeScript + better-sqlite3
    src/
      server.ts             entry; createApp() factory used by route tests
      config.ts             env + paths
      drill-seed-schema.ts  Zod schema (LOCAL.md §16); shared by seed/import/verify
      verify-drills.ts      CLI lint: schema + duplicate ids + quality warnings
      gen-drills.ts         LLM Layer-3 draft generator
      seed.ts               YAML → DB loader + importDrillsFromYaml
      seed-templates.ts     Layer-2 template expander
      db/
        migrations.ts       SQLite schema (mirrors LOCAL.md §7)
        repo.ts             drills, sessions, attempts, events, cards,
                            skillState, usageEvents
      engines/
        rotation.ts         LOCAL.md §8 scoring + mock_interview variant
        grading.ts          LOCAL.md §10 rubric grading (LLM + offline)
      services/
        realtime.ts         OpenAI Realtime client_secret + tool defs
        llm.ts              OpenAI SDK wrapper
      resources/            optional resource → drill draft pipeline (CLIs)
      routes/index.ts       REST API (LOCAL.md §12)
    migrations/postgres.sql Postgres-flavored schema (doc; not the runtime)
    seeds/drills/*.yaml     canonical drill bank (LOCAL.md §16 format)
    seeds/templates/*.yaml  Layer-2 templates
    seeds/realtime-prompt.md drill-coach agent prompt (paste into Playground)
  frontend/         React + Vite + TypeScript
    src/
      App.tsx               drill UI, history, events timeline, admin panels
      useRealtime.ts        WebRTC + tool dispatch (LOCAL.md §3 / §6)
      api.ts                fetch wrapper for /api/* (typed clients)

scripts/
  smoke-all.mjs             every-layer composite + pass/fail table
  realtime-webrtc-smoke.mjs Playwright + fake-mic realtime smoke harness
  drill-loop-smoke.mjs      offline REST loop smoke
  drill-loop-browser-smoke.mjs offline Playwright UI smoke
  dev-reset.mjs             wipe + reseed local SQLite
  doctor.mjs                pnpm dev:doctor — environment diagnostic

Setup

Requires Node 22+, pnpm 10+, and an OpenAI API key with Realtime access.

pnpm install
cp apps/backend/.env.example .env   # or place .env at the repo root
# edit .env: set OPENAI_API_KEY
pnpm dev                            # starts backend (4000) + frontend (5173)

Open http://localhost:5173.

Single commands:

pnpm dev:backend                          # tsx watch on src/server.ts
pnpm dev:frontend                         # vite
pnpm --filter @drill/backend seed          # re-seed drills from YAML
pnpm --filter @drill/backend seed:templates # expand Layer-2 templates
pnpm --filter @drill/backend gen:drills -- --topic X --count N  # LLM Layer-3 drafts
pnpm dev:doctor      # environment diagnostic (Node, pnpm, env, sqlite, ports…)
pnpm dev:reset       # wipe local SQLite + reseed (refuses while dev is running)
pnpm verify:drills   # lint seeds/drills/*.yaml against the schema
pnpm check           # build + tests + offline smokes (legacy alias)
pnpm smoke:all       # CI gate (with --offline-only) and pre-release every-layer check (~5 min with OPENAI_API_KEY)

Environment variables

Variable	Default	Notes
`PORT`	`4000`	backend port
`DATABASE_PATH`	`apps/backend/data/drill.db`	SQLite file
`OPENAI_API_KEY`	—	required for realtime + LLM grading
`OPENAI_REALTIME_MODEL`	`gpt-realtime-2`	LOCAL.md §17 default
`OPENAI_REALTIME_TRANSCRIPTION_MODEL`	`gpt-4o-mini-transcribe`	ASR model for user audio transcript events
`OPENAI_REALTIME_TRANSCRIPTION_LANGUAGE`	—	optional language hint, e.g. `en`
`OPENAI_GRADING_MODEL`	`gpt-4.1-mini`	text grading after attempt
`OPENROUTER_API_KEY`	—	optional; enables shadow grader benchmarking only
`OPENROUTER_BASE_URL`	`https://openrouter.ai/api/v1`	OpenAI-compatible OpenRouter endpoint
`OPENROUTER_MODEL_TTL_MS`	`600000`	cache TTL for OpenRouter model list
`OPENROUTER_COOLDOWN_MS`	`600000`	temporary skip window for unavailable/rate-limited free models
`OPENROUTER_TIMEOUT_MS`	`20000`	per-model shadow grading timeout
`OPENAI_REALTIME_PROMPT_ID`	—	optional Playground prompt id; when set, the backend sends `prompt: { id }` instead of inlining `DRILL_COACH_INSTRUCTIONS`
`OPENAI_REALTIME_PROMPT_VERSION`	—	optional version pin for the Playground prompt above; bump after iterating on `seeds/realtime-prompt.md`
`OPENAI_REALTIME_VOICE_SPEED`	`1.25`	clamped to `[0.25, 1.5]`; higher = faster speech
`OPENAI_REALTIME_TOKEN_ATTEMPTS`	`3`	retry budget for the `client_secrets` mint on retryable upstream errors
`REALTIME_VOICE`	`marin`	voice id
`FRONTEND_ORIGIN`	`http://localhost:5173`	CORS allowlist
`USE_OFFLINE_GRADER`	`0`	`1` → deterministic keyword grader (no API call)

How it works

Realtime connection (LOCAL.md §3 Option A)

The browser asks the backend for an ephemeral token.
The backend calls POST https://api.openai.com/v1/realtime/client_secrets with the drill-coach system instructions, voice, and reasoning effort.
The browser builds an RTCPeerConnection, adds microphone audio, opens a data channel, and posts its SDP offer directly to POST /v1/realtime/calls?model=... with the ephemeral token. The OpenAI API key never leaves the backend.
Audio streams over WebRTC; the model's instructions force the strict interview-drill style (LOCAL.md §11, §14).

Drill loop

POST /api/drill-sessions → session id.
POST /api/drill-sessions/:id/next runs the rotation engine and returns a drill plus a pre-created attempt_id.
While voice is live the frontend pushes the drill text into the agent's conversation, and the agent asks it aloud.
The user speaks; transcription comes back over the data channel.
On Submit, the frontend POSTs the transcript + duration to POST /api/drill-attempts/:id/grade. The grader runs rubric-first scoring, persists the attempt, updates user_skill_state, and inserts generated_cards.

Rotation engine (LOCAL.md §8)

apps/backend/src/engines/rotation.ts implements the full weighted score:

0.35 * due
 + 0.25 * weakness
 + 0.15 * novelty
 + 0.10 * difficultyFit
 + 0.10 * topicBalance
 + 0.05 * trapDiversity
 − 0.50 * recentRepeatPenalty
 − 0.30 * exactRepeatPenalty

Top 5 candidates are weighted-random-picked so the app is not predictable.

mock_interview mode swaps the formula to prefer variety and high difficulty over due/weakness:

0.40 * novelty + 0.20 * topicBalance + 0.20 * difficulty
+ 0.10 * weakness + 0.05 * due + 0.05 * trapDiversity
- 0.60 * recentRepeatPenalty - 0.40 * exactRepeatPenalty

and also pre-filters the pool to difficulty ≥ 3 plus drills the user hasn't attempted recently, so a "mock interview" session feels different from a study session.

Grading (LOCAL.md §10)

apps/backend/src/engines/grading.ts has two graders:

LLM grader (default) — calls OPENAI_GRADING_MODEL with the rubric and transcript and parses JSON back into the score breakdown.
Offline grader — deterministic keyword matching for tests / no-API environments. Triggered by USE_OFFLINE_GRADER=1 or absence of OPENAI_API_KEY.

Final score formula:

0.65 * must_have_coverage
+ 0.20 * answer_clarity
+ 0.10 * tradeoff_coverage
+ 0.05 * speed_score
− red_flag_penalty

Verdict: >= 0.80 pass, 0.60–0.79 borderline, < 0.60 fail.

API

Method	Path	Purpose
`GET`	`/api/health`	drill count + OpenAI configured flag
`POST`	`/api/realtime/token`	mint ephemeral Realtime client secret
`POST`	`/api/drill-sessions`	start a drill session
`POST`	`/api/drill-sessions/:id/next`	pick next drill via rotation
`POST`	`/api/drill-sessions/:id/retry`	force a fresh attempt on a specific drill (bypasses rotation)
`POST`	`/api/drill-attempts/:id/transcript`	save transcript + duration
`POST`	`/api/drill-attempts/:id/grade`	grade an attempt (LLM or offline)
`GET`	`/api/drill-attempts/:id`	full attempt detail (transcript, missed points, ideal answer, cards) — owner-scoped
`POST`	`/api/drill-attempts/:id/evaluate`	run OpenRouter shadow grading; does not mutate live score/cards
`GET`	`/api/drill-attempts/:id/evaluations`	list stored shadow grader evaluations for an attempt
`GET`	`/api/cards/due`	due review cards + total/due stats
`POST`	`/api/cards/:id/review`	record SM-2-lite review (`quality` 0/1)
`GET`	`/api/cards/export.csv`	Anki-importable CSV (`front,back,tags`)
`GET`	`/api/progress`	per-topic weakness state
`GET`	`/api/progress/drills`	per-drill performance (attempts, avg/best/worst score, last verdict) sorted by avg ascending
`GET`	`/api/drills`	drill bank browse (active only)
`GET`	`/api/drills/drafts`	Layer-3 LLM drafts (is_active=false)
`GET`	`/api/drills/export.yaml`	dump active drills as YAML (seed format); `?include_drafts=1` to include drafts
`POST`	`/api/drills/import`	upsert drills from YAML body (or `{ yaml: "…" }`); 207 with per-item errors when partial
`GET`	`/api/stats`	drill bank distribution: active vs drafts, by topic / difficulty / trap_type
`GET`	`/api/sessions`	recent sessions for the user, newest first, with rollup stats (`?limit=N`, default 25)
`POST`	`/api/drills/:id/activate`	promote a draft into the rotation pool
`POST`	`/api/drills/:id/deactivate`	pull a drill back out of the rotation pool (mirror of activate)
`PATCH`	`/api/drills/:id`	edit rubric / canonical answer / difficulty / question text
`POST`	`/api/drills/:id/test-grade`	dry-run grader against a sample answer (no persist)
`DELETE`	`/api/drills/:id`	delete a draft (active drills are protected)
`POST`	`/api/realtime/tool-call`	dispatch for the voice agent's tool calls
`POST`	`/api/realtime/usage`	record token usage from a realtime response (dedupes by `response_id`)
`GET`	`/api/usage/summary`	aggregated token usage for the user (current session + lifetime totals)
`GET`	`/api/drill-sessions/:id/summary`	per-session stats (attempts, scores, topics)
`GET`	`/api/drill-sessions/:id/events`	audit log (LOCAL.md §7 `session_events`)
`GET`	`/api/admin/events`	admin audit trail — drill imports, draft state changes, rubric edits. Optional `?type=` (CSV of `drill_imported,draft_activated,draft_deactivated,draft_discarded,rubric_edited`), `?since=` (ISO 8601), `?actor=` (`x-user-id` value), `?limit=N` (default 100, max 500). Every payload includes `actor` and (where relevant) `drill_id` / `fields_changed`.
`POST`	`/api/drill-sessions/:id/end`	mark ended + return summary

Seed drills

YAML files in apps/backend/seeds/drills/ are loaded on every server start (drill_items.upsert so edits are picked up). Schema follows LOCAL.md §16.

To add a drill: create or edit a YAML file, run pnpm --filter @drill/backend seed, or just restart the backend.

Layer-2 templates (LOCAL.md §9)

Templates live in apps/backend/seeds/templates/*.yaml. Each declares a template_text, a rubric_template, a canonical_answer_template, and a list of variants with named vars. The expander interpolates the variables into every field and upserts concrete drill_items rows.

pnpm --filter @drill/backend seed:templates

One composite-index template currently expands to 4 variants (orders / events / messages / invoices), all tagged with tmpl:<id> so template-derived drills are filterable.

Layer-3 LLM-generated drafts (LOCAL.md §9 Layer 3)

pnpm --filter @drill/backend gen:drills -- \
  --topic caching --subtopic eviction --count 3 --difficulty 3

Inserts drills as is_active=false drafts so the rotation engine never serves them until a human flips the bit. Tagged with gen:llm for filtering. Uses OPENAI_GRADING_MODEL (default gpt-4.1-mini).

Shadow grader benchmark

pnpm --filter @drill/backend bench:grader --source historical --models free-pinned --limit 25
pnpm --filter @drill/backend bench:grader --source attempt --attempt-id <id>

Requires OPENROUTER_API_KEY. Results are stored in grading_evaluations and never overwrite the live attempt score, cards, or skill state. The default model policy only uses OpenRouter models currently reported as zero-cost.

Review and activate drafts from the UI: click Show drafts in the header to see every is_active=false drill with rubric preview, then Activate to promote into the rotation pool or Discard to delete.

Layer-4 resource extraction (beyond LOCAL.md §9 — optional)

Pulls Markdown from GitHub repos listed in the resource manifest (see path below), splits sections, and emits draft drills (is_active=false). Useful for bootstrapping a topic area from a canonical reference doc. Skips the LLM round-trip if you want deterministic drafts to review by hand. Same activation flow as Layer 3.

Set GITHUB_TOKEN in .env to raise the GitHub API rate limit from 60/hr (anonymous) to 5000/hr (authenticated). Any repo-scope or public_repo-scope token is enough — the pipeline only reads public files.

# Phases: assess → extract → generate-drills → all
pnpm extract:resources -- --phase all --resource system-design-primer
pnpm extract:resources -- --phase all --limit 5 --dry-run    # preview, no writes
pnpm import:resource-drafts                                   # latest run, all resources
pnpm import:resource-drafts -- --resource system-design-primer --run 20260520T123456Z

Resource manifest lives at .agents/skills/resource-extraction/resources.json. Artifacts land under data/resources/<slug>/<run-id>/ (gitignored): an assessment.json, documents.jsonl, and draft_drills.yaml round-trippable through the same Zod schema as Layer-1 seeds. import:resource-drafts defaults to --run latest, so the common case is just running it bare.

Admin: rubric editor + dry-run grader (LOCAL.md §13)

In the drill browse panel, expand any drill to see its rubric. Two admin surfaces are wired:

Edit rubric — opens textareas for must-have / nice-to-have / red flags / canonical short answer plus a difficulty selector. Saving issues PATCH /api/drills/:id, validates with the same Zod schema as YAML seeds, and refreshes the browse list.
Test grade — paste a sample answer, run the grader against the current rubric, see score + verdict + missed-points count, without writing an attempt or touching skill state. Great for tuning rubrics on newly activated Layer-3 drafts.

Pressure mode (LOCAL.md §15 MVP 3)

Header Pressure ON/off toggle. When on, every drill push appends an explicit "interrupt rambling after ~10s; snap 'Default answer now.'; force at least one pressure follow-up" clause to the agent's per-drill instruction. Lets the user dial the intensity from study-buddy to drill-instructor without re-minting the realtime session.

What is not in the MVP

Mapped to LOCAL.md §15:

Postgres — MVP uses SQLite. Schema is portable; the migration file is the obvious place to swap when you need multi-writer or hosted infra.
Card-review UI for the spaced-repetition slots already in the schema.
Layer-2 template generator (drill_templates) — schema exists, no generator yet.
Admin/content editor, payments, Anki sync, per-user auth — not in MVP.

Voice-agent tool protocol (LOCAL.md §6) — wired

The Realtime agent has six backend tools attached to the session config: get_next_drill, submit_answer_transcript, grade_attempt, save_generated_cards, get_user_skill_summary, end_session_summary.

Tool calls flow over the data channel and are dispatched via a single backend endpoint POST /api/realtime/tool-call. The frontend hook (useRealtime) tracks (item_id → name) pairs across response.output_item.added and response.function_call_arguments.done, runs the registered handler, and sends back conversation.item.create with function_call_output plus response.create.

App.tsx mirrors agent-driven get_next_drill and grade_attempt results into local state so the UI follows the agent.

Smoke test (CLI)

# health
curl -s localhost:4000/api/health | jq

# start session, pick a drill, grade an answer
SID=$(curl -s -X POST localhost:4000/api/drill-sessions \
  -H 'content-type: application/json' \
  -d '{"mode":"db_indexes"}' | jq -r .session.id)
ATT=$(curl -s -X POST localhost:4000/api/drill-sessions/$SID/next \
  -H 'content-type: application/json' -d '{}' | jq -r .drill.attempt_id)
curl -s -X POST localhost:4000/api/drill-attempts/$ATT/grade \
  -H 'content-type: application/json' \
  -d '{"transcript":"composite B-tree on (category_id, price), equality then order, verify with EXPLAIN ANALYZE","duration_seconds":45}' | jq

Voice UX affordances

LOCAL.md §5 / §11 call for the agent to "ask the question aloud" and the session to feel like a fast back-and-forth. The frontend gives you four visual confirmations so you can tell at a glance whether the voice loop is healthy, even with audio output muted:

Voice-first start — clicking Start session / Next drill chains startSession → nextDrill → realtime.start in one click. No separate "Start voice" step.
Coach (audio) transcript (data-testid="agent-transcript") — the agent's spoken transcript appears above the answer textarea as it speaks. If your speakers fail, you still see exactly what the coach said.
Voice state badge (data-testid="voice-state") — 🔊 Coach speaking (highlighted) vs 🎤 Listening (dim), derived from response.output_audio_buffer.started / response.done events on the data channel.
Mic meter (data-testid="mic-meter") — 5-bar VU pulled from a Web Audio AnalyserNode on the local mic track, sampled at ~10 Hz. If the bars never light up, your mic is dead.
Voice error banner (data-testid="voice-error-banner") — when the WebRTC handshake or the ephemeral token mint fails, a red banner appears with the message and a short troubleshooting hint (mic permission · OPENAI_API_KEY on backend · HTTPS/localhost).

Keyboard shortcuts

The drill loop is keyboard-first so you can stay typing/talking without reaching for the mouse:

Keys	Action
`⌘` / `Ctrl` + `Enter`	Submit the typed answer (works from inside the textarea)
`n`	Next drill
`Shift` + `R`	Retry the current drill (creates a fresh attempt on the same drill)
`e`	End session
`p`	Toggle pressure mode

Single-key shortcuts (n, e, p) are suppressed while any input, textarea, select, or contentEditable element has focus, so typing the answer never triggers them. The hint row sits right under the action buttons (data-testid="shortcuts-hint").

Audio troubleshooting

If clicking Start voice connects but you hear nothing, in order of likelihood:

Browser autoplay is silently rejecting the <audio> element's play(). Click anywhere on the page after voice connects — that's a user gesture. If it still won't play, open dev tools and check for NotAllowedError.
Your output device is muted or routed somewhere unexpected. The Coach (audio) transcript will still update — if it does, audio is arriving, you just can't hear it.
Stale dev server. Use one Vite instance; if you have multiple tabs on different ports (5173, 5174, …) you may be running older code that pre-dates the autonomy nudge or voice-first flow.

Testing & smoke

Three layers, fastest to slowest:

Layer	Command	What it proves
Unit + route tests	`pnpm -r test`	rotation engine, offline grader, AND HTTP routes (session ownership, draft activation, dry-run grader, tool-call dispatch, rubric editing). 20 tests. Runs Express in-process on an ephemeral port, no network.
REST drill loop	`pnpm smoke:drill-loop`	end-to-end loop over HTTP for N drills with the offline grader — verifies rotation produces variety, weakness state moves, mixed verdicts. Boots its own backend on an isolated DB.
Browser drill loop	`pnpm smoke:browser`	exercises App.tsx in Chromium (no mic): Start → type answer → Submit → grade panel renders → Next drill → question changes.
Realtime WebRTC	`pnpm smoke:realtime`	full voice path: launches Chromium with `--use-file-for-fake-audio-capture` against a Mumbli WAV, asserts the model connects, ASR transcript appears, and at least 1 backend tool gets dispatched. Requires `OPENAI_API_KEY`.
Realtime multi-turn	`pnpm smoke:realtime:multi`	same harness, longer wait (~90 s), asserts ≥ 2 distinct tool calls — proves the agent runs the actual drill loop (e.g. `submit_answer_transcript` then `grade_attempt`) rather than just calling `get_next_drill` once and stopping.
Realtime autonomy	`pnpm smoke:realtime:loop`	strictest — wait up to ~2 min, asserts ≥ 3 total tool calls including `get_next_drill`. Proves the agent calls `submit_answer_transcript` → `grade_attempt` → `get_next_drill` autonomously. Verifies LOCAL.md §18 ("backend owns curriculum, model drives it").

Run everything offline in one shot — exactly what CI runs:

pnpm smoke:all --offline-only   # doctor + verify:drills --strict + build + tests + REST smoke + browser smoke

Or run all smokes (offline + realtime) with a pass/fail summary:

pnpm smoke:all                 # 10 layers, ~5 minutes; needs OPENAI_API_KEY
pnpm smoke:all --offline-only  # skip the 4 realtime smokes

Sample output:

▶ dev:doctor…✓ dev:doctor (0.8s)
▶ verify:drills --strict…✓ verify:drills --strict (0.5s)
▶ build…✓ build (1.4s)
▶ test (backend unit + route, frontend pure)…✓ test (backend unit + route, frontend pure) (2.0s)
▶ smoke:drill-loop…✓ smoke:drill-loop (1.3s)
▶ smoke:browser…✓ smoke:browser (2.7s)
▶ smoke:realtime…✓ smoke:realtime (62.0s)
▶ smoke:realtime:multi…✓ smoke:realtime:multi (67.8s)
▶ smoke:realtime:loop…✓ smoke:realtime:loop (73.9s)
▶ smoke:realtime:end…✓ smoke:realtime:end (77.2s)

10/10 passed

You can run the drill linter on its own:

pnpm verify:drills
# verify:drills OK — 51 drills validated across 21 files

Same command CI uses. The realtime smokes are separate because they need OPENAI_API_KEY:

pnpm -r test                  # unit + route tests
pnpm smoke:drill-loop         # offline REST loop
pnpm smoke:browser            # offline browser loop
pnpm smoke:realtime           # online realtime (>=1 tool call)
pnpm smoke:realtime:multi     # online (>=2 distinct tool names)
pnpm smoke:realtime:loop      # online (>=3 calls incl. get_next_drill)
pnpm smoke:realtime:end       # online ("Stop" → end_session_summary)

Useful overrides

# point smokes at an already-running stack instead of starting one
USE_EXISTING_BACKEND=1 USE_EXISTING_FRONTEND=1 pnpm smoke:browser

# specific Mumbli WAV (otherwise picks the latest >32 KB)
REALTIME_SMOKE_AUDIO="/absolute/path/sample.wav" pnpm smoke:realtime

# show the browser
HEADLESS=0 pnpm smoke:realtime

# don't fail the realtime smoke if the agent skips tool calls
REALTIME_SMOKE_REQUIRE_TOOL=0 pnpm smoke:realtime

# how long to wait for the agent's first tool call (default 20s)
REALTIME_SMOKE_TOOL_WAIT_MS=30000 pnpm smoke:realtime

The realtime smoke output includes a screenshot path and a tail of recent Realtime server events so you can confirm response.output_audio.done, input_audio_buffer.speech_stopped, and transcription deltas all arrived.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.agents/skills		.agents/skills
.githooks		.githooks
.github/workflows		.github/workflows
apps		apps
docs		docs
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
docker-compose.yml		docker-compose.yml
ontology.yml		ontology.yml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
skills-lock.json		skills-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT Realtime Interview Drill Coach

LOCAL.md spec coverage

Layout

Setup

Environment variables

How it works

Realtime connection (LOCAL.md §3 Option A)

Drill loop

Rotation engine (LOCAL.md §8)

Grading (LOCAL.md §10)

API

Seed drills

Layer-2 templates (LOCAL.md §9)

Layer-3 LLM-generated drafts (LOCAL.md §9 Layer 3)

Shadow grader benchmark

Layer-4 resource extraction (beyond LOCAL.md §9 — optional)

Admin: rubric editor + dry-run grader (LOCAL.md §13)

Pressure mode (LOCAL.md §15 MVP 3)

What is not in the MVP

Voice-agent tool protocol (LOCAL.md §6) — wired

Smoke test (CLI)

Voice UX affordances

Keyboard shortcuts

Audio troubleshooting

Testing & smoke

Useful overrides

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPT Realtime Interview Drill Coach

LOCAL.md spec coverage

Layout

Setup

Environment variables

How it works

Realtime connection (LOCAL.md §3 Option A)

Drill loop

Rotation engine (LOCAL.md §8)

Grading (LOCAL.md §10)

API

Seed drills

Layer-2 templates (LOCAL.md §9)

Layer-3 LLM-generated drafts (LOCAL.md §9 Layer 3)

Shadow grader benchmark

Layer-4 resource extraction (beyond LOCAL.md §9 — optional)

Admin: rubric editor + dry-run grader (LOCAL.md §13)

Pressure mode (LOCAL.md §15 MVP 3)

What is not in the MVP

Voice-agent tool protocol (LOCAL.md §6) — wired

Smoke test (CLI)

Voice UX affordances

Keyboard shortcuts

Audio troubleshooting

Testing & smoke

Useful overrides

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages