Important
Work in progress — no stable release yet. Johnny is being built in
the open by one person; the code on main is the only version right
now, and the UX, APIs, and docs are still moving. When a polished cut
is ready, it will be published on the
Releases page — watch
this repo to be notified. Until then, expect rough edges, breaking
changes, and a few half-wired pieces.
Single-user AI assistant that joins Google Meet meetings, transcribes, and optionally speaks within configured constraints.
🌐 Landing page: https://mekedron.github.io/Johnny/
See tasks/prd-johnny-google-meet-ai-meeting-bot.md for the full PRD.
backend/ FastAPI app (Python, managed by uv)
frontend/ SvelteKit UI (TypeScript, managed by pnpm)
- uv for Python dependency management
- pnpm and Node.js 20+ for the frontend
- Docker (with Compose) for the full stack
Backend:
cd backend
uv sync # install dependencies
uv run uvicorn app.main:app --reload --port 8000 # start APIFrontend:
cd frontend
pnpm install # install dependencies
pnpm dev # start dev server on :5173Once both are running, open http://localhost:5173 and the home page will fetch
GET /health from the backend to confirm wiring.
The fastest path is the interactive setup wizard. From backend/:
uv sync
uv run johnny-setup # or: uv run python -m johnny.wizardThe wizard checks prerequisites, generates FERNET_KEY, walks you through
Google OAuth, downloads local models (faster-whisper / Piper / Ollama),
registers providers via the API, runs smoke tests, and opens the UI. It is
re-runnable and supports a --non-interactive answers.yaml mode for CI.
To do it manually instead, the complete stack (API, worker, frontend, PostgreSQL+pgvector, Redis) runs via Docker Compose:
cp .env.example .env # fill in secrets as needed
docker compose upThe api is reachable at http://localhost:8000, the frontend at
http://localhost:5173. Postgres listens on the internal compose network only;
connect via docker compose exec postgres psql -U johnny.
Once .env is filled in and the stack is up, verify everything works:
cd backend
uv run johnny-smoke --project-root ..The smoke test prints one PASS / SKIP / FAIL row per check (compose health,
migrations, Fernet, Google OAuth config, provider credentials, local model
dirs, Ollama reachability, Docker launcher, WS upgrade, frontend) and exits
non-zero if any non-SKIP check failed. See docs/SETUP_LOCAL.md §15 for the
full reference; manual setup walkthrough is in the same document.
Per-meeting bot sessions run in their own short-lived container based on
the johnny-meet-worker image (Playwright + Chromium + Xvfb + PulseAudio).
The image is built but not started by default — the session scheduler
spawns one container per active Meet via the Docker SDK.
Build it:
docker compose --profile meet-worker build meet-workerVerify the A/V environment standalone:
docker run --rm johnny-meet-worker:latest # prints "self-check OK"Dedicated adapters for the hosted APIs. Configure each via the Providers
page; only api_key is required, all other options have sensible defaults.
- OpenAI (
openai) — defaults togpt-4o-mini; overridemodelforgpt-4o,o1-mini, or anything else OpenAI hosts. - Anthropic (
anthropic) — defaults toclaude-3-5-haiku-20241022; overridemodelforclaude-3-5-sonnet-20241022,claude-opus-4-7, etc. Optionalmax_tokens(default 1024) andanthropic_version(default2023-06-01). - Gemini (
gemini) — defaults togemini-1.5-flash; overridemodelforgemini-1.5-pro,gemini-2.0-flash, etc. Supports native JSON-mode viaresponse_format(setsresponseMimeType+responseSchema).
The openai-compatible LLM adapter targets any OpenAI-compatible chat
completions endpoint. Configure via the Providers page with base_url
and model:
- vLLM (Qwen):
base_url=http://vllm:8000/v1,model=Qwen/Qwen2.5-7B-Instruct - Ollama (Llama):
base_url=http://ollama:11434/v1,model=llama3.1:8b
Set tool_format=hermes for Hermes-style fine-tunes that emit
<tool_call>{...}</tool_call> markers instead of OpenAI-native
tool_calls.
Several STT/TTS runtimes run as host sidecars — native macOS processes outside Docker (Apple MLX / CoreML / espeak need hardware the arm64 Linux api container can't reach). The api talks to them over host.docker.internal:<port>. You rarely manage them by hand:
./run.shauto-starts every available sidecar after the Docker stack is up, so a saved sidecar-runtime works immediately../stop.shstops them before tearing the stack down (so an in-flight call drains into a clean error).- A missing toolchain is a soft skip, never a hard failure: no
swift→parakeet-coremlis reportedSKIPPEDand the rest still come up. - Opt out per-sidecar with
JOHNNY_DISABLED_SIDECARS(comma-separated<provider>-<backend>keys), e.g.JOHNNY_DISABLED_SIDECARS=parakeet-coreml,kokoro-mlx ./run.sh.
One umbrella drives them all:
./scripts/start-sidecars.sh start # start every enabled sidecar (idempotent)
./scripts/start-sidecars.sh status # one line per sidecar: running | stopped | disabled | unavailable
./scripts/start-sidecars.sh stop # stop every running sidecar
./scripts/start-sidecars.sh restart # stop + start
./scripts/start-sidecars.sh --help # full helpStandardised per-launcher CLI. Every scripts/start-<provider>-sidecar.sh accepts the same commands so the umbrella can dispatch uniformly (run --help on any of them for the full block):
| Commands | start [<backend>], stop [<backend>], restart [<backend>], status, logs [<backend>], --help. Omitting <backend> means "every backend this provider has". |
| Env vars | <PROVIDER>_<BACKEND>_PORT / _HOST / _MODEL (e.g. PARAKEET_MLX_PORT=8765, PIPER_HTTP_PORT=8775), plus JOHNNY_SIDECAR_LOG_DIR (default .validation/). |
| Exit codes | 0 ok · 1 failure · 2 bad usage · 3 toolchain unavailable (→ umbrella prints SKIPPED) · 4 port conflict. |
| Logs / PIDs | .validation/<provider>-<backend>-sidecar.{log,pid} — one naming rule, so status and /sidecars/health discover everything by globbing one directory. |
The current Parakeet mlx / coreml positional form still works as a transitional alias for start mlx / start coreml (it prints a one-line deprecation note). The shared contract is regression-checked by ./scripts/check-sidecar-cli.sh.
Live status in the UI. The api exposes GET /sidecars/health (probe all known sidecars) and GET /sidecars/health?url=<base> (probe one). The Providers modal uses it to badge the Runtime picker sidecar running / sidecar offline — start with ./scripts/start-sidecars.sh start the moment you pick a sidecar runtime, so you see why before clicking Test.
The Parakeet STT provider supports three runtimes selectable from the Providers page (Settings → Providers → NVIDIA Parakeet → Runtime). Pick whichever fits your speed / ops trade-off.
| Runtime | Where it runs | Latency for 5 s audio (warm) | Setup |
|---|---|---|---|
in-container (default) |
api container, PyTorch / NeMo on CPU | ~1 s | Click Install package on the provider card. NeMo + deps go to ~/.johnny/parakeet-packages (~3 GB). The model is now cached at process scope, so only the first /stt_test after an api restart pays the multi-second from_pretrained cost. |
mlx-sidecar |
macOS host, Apple MLX (Metal GPU) | ~100 ms | ./scripts/start-parakeet-sidecar.sh start mlx. Runs sidecars/parakeet-mlx/server.py under uv on port 8765. First launch downloads mlx-community/parakeet-tdt-0.6b-v3 from HuggingFace. |
coreml-sidecar |
macOS host, Swift + FluidAudio (CoreML on the Apple Neural Engine) | ~150 ms target (matches VoiceInk) | ./scripts/start-parakeet-sidecar.sh start coreml. Runs sidecars/parakeet-coreml/.build/release/parakeet-coreml-sidecar on port 8766. Requires Xcode command-line tools (xcode-select --install); first build fetches FluidAudio + Hummingbird and compiles the binary. |
Why sidecars exist: inside the arm64 Linux api container, PyTorch has no MPS / CoreML / ANE access — the model runs on CPU regardless of the device knob. Sidecars run natively on the macOS host and the api container POSTs raw PCM to them over host.docker.internal. Wire protocol is documented in each sidecar's README.
Sidecar management:
./scripts/start-parakeet-sidecar.sh start mlx # start MLX sidecar (:8765)
./scripts/start-parakeet-sidecar.sh start coreml # start CoreML sidecar (:8766)
./scripts/start-parakeet-sidecar.sh status # which sidecars are up
./scripts/start-parakeet-sidecar.sh stop # stop any running sidecarHealth checks: curl http://localhost:8765/health (MLX) / curl http://localhost:8766/health (CoreML) — both return {"ready": true, "model_id": "..."} once loaded.
The Local Piper TTS provider supports three runtimes selectable from the Providers page (Settings → Providers → Local Piper → Runtime). TTS pays its cost on every conversation turn (unlike STT, which is pre-loaded), so the runtime choice directly drives how fast Johnny starts speaking.
| Runtime | Where it runs | Time-to-first-audio (warm) | Setup |
|---|---|---|---|
subprocess (default) |
api container, fresh piper CLI per call |
~200–400 ms (cold every turn) | None. Safe single-step-debug baseline; identical to the historical behaviour. |
persistent-subprocess |
api container, warm in-process PiperVoice (ONNX session) cached at module scope |
~40–60 ms | None. The first synth per voice pays the ~700 ms load; every later turn is warm. This is the meeting-latency win — pick it for real meetings. |
http-sidecar |
macOS host, piper sidecar process | ~80–120 ms (network round-trip + warm synth) | ./scripts/start-piper-sidecar.sh start. Runs sidecars/piper-http/server.py under uv on port 8775; the api POSTs to http://host.docker.internal:8775. Set the Sidecar URL field. |
Why persistent-subprocess is in-process, not a child process: piper-tts 1.x (the Python rewrite installed via the local-tts extra) dropped the old C++ piper's --json-input streaming CLI and its --http server. There is no long-running piper to pipe into and no upstream HTTP server to point at, so the warm path keeps PiperVoice warm inside the api process, and the http-sidecar is a thin FastAPI wrapper around the same library (see sidecars/piper-http/README.md).
Sidecar management:
./scripts/start-piper-sidecar.sh start # build venv + launch on :8775
./scripts/start-piper-sidecar.sh status # is the sidecar up?
./scripts/start-piper-sidecar.sh stop # stop itHealth check: curl http://localhost:8775/health → {"ready": true, "voice": "...", "backend": "piper"} once the default voice is loaded.
Kokoro is an 82 M-parameter Apache-2.0 TTS model — multi-voice (American / British English plus Spanish, French, Hindi, Italian, Japanese, Portuguese and Mandarin), small enough to run on CPU, with cleaner prosody than most models its size. Every voice lives in the single checkpoint, so switching voice is instant and needs no per-voice install (unlike Piper). It emits 24 kHz mono float audio; the adapter converts to S16LE and resamples to the canonical 16 kHz bridge format. Pick the runtime from Settings → Providers → Kokoro → Runtime.
| Runtime | Where it runs | Time-to-first-audio (warm) | Setup |
|---|---|---|---|
in-container (default) |
api container, kokoro in the api process; warm KPipeline (KModel) cached at module scope keyed by (model, language) |
well under 200 ms warm | Requires the kokoro library in the api image. The first synth per language pays the multi-second model load; later turns are warm. CPU-only inside the container — for Apple-Silicon acceleration use the MLX sidecar. |
mlx-sidecar |
macOS host, Kokoro under Apple MLX (Metal GPU) via mlx-audio |
~80–120 ms (network round-trip + warm synth) | ./scripts/start-kokoro-sidecar.sh start mlx. Runs sidecars/kokoro-mlx/server.py under uv on port 8772; the api POSTs to http://host.docker.internal:8772. |
http-sidecar |
host process (CPU, or a CUDA GPU on a Linux box), upstream Kokoro | depends on host hardware | ./scripts/start-kokoro-sidecar.sh start http. Runs sidecars/kokoro-http/server.py under uv on port 8773; the api POSTs to http://host.docker.internal:8773. The non-MLX out-of-container path. |
Both sidecars speak the same wire protocol as sidecars/piper-http (POST /synthesize JSON in, raw PCM + X-Sample-Rate out), so a single api-side adapter drives all three runtimes. Voices encode language + gender in the first two letters (af/am American, bf/bm British, plus the other-language sets); non-English voices use espeak-ng for grapheme-to-phoneme, so install it on whichever host runs the model (brew install espeak-ng).
Sidecar management:
./scripts/start-kokoro-sidecar.sh start mlx # start MLX sidecar (:8772)
./scripts/start-kokoro-sidecar.sh start http # start generic sidecar (:8773)
./scripts/start-kokoro-sidecar.sh status # which sidecars are up
./scripts/start-kokoro-sidecar.sh stop # stop any running sidecarHealth checks: curl http://localhost:8772/health (MLX) / curl http://localhost:8773/health (HTTP) — both return {"ready": true, ...} once the model has loaded.
A successful HTTP round-trip is not proof of audible speech: a runtime can return 200 OK with empty or all-zero PCM and the user simply hears nothing (this is exactly how a broken Kokoro MLX sidecar once failed silently). johnny-tts-smoke guards against that — it drives every saved TTS provider × every runtime it supports × the first available voice through /play_sample and asserts the audio is actually audible (non-trivial byte count, plausible duration for the text, and a peak amplitude above the silence floor). It prints one PASS / SKIP / FAIL row per cell and exits non-zero on any FAIL; a sidecar that is offline or a voice that is not installed is a SKIP, never a FAIL.
docker compose exec api johnny-tts-smoke # against the running stack
# or, without rebuilding after a source edit:
docker compose exec api python -m johnny.smoketest.tts_clipiper subprocess PASS 93 ms, 18400 bytes, peak 0.31
piper persistent-subprocess PASS 47 ms, 18120 bytes, peak 0.30
piper http-sidecar SKIP piper sidecar unreachable: start ./scripts/start-piper-sidecar.sh
kokoro mlx-sidecar FAIL 0 ms, 0 bytes, peak 0.00 -- no audible output
The same audible-or-not verdict rides back on the /play_sample response headers (X-TTS-Audible, X-TTS-Audio-Bytes, X-TTS-Audio-Ms, X-TTS-Peak), so the Providers page warns inline when a Play sample click returns silence instead of leaving the user guessing.
The voice pipeline runs over a swappable transport. The default —
LocalAudioTransport wrapping the meet-worker's PulseAudio bridge —
is selected automatically. To run the pipeline inside a LiveKit room
instead, set one env var:
JOHNNY_TRANSPORT=livekit \
LIVEKIT_URL=wss://livekit.example \
LIVEKIT_TOKEN=<join-token> \
LIVEKIT_ROOM=<room-name> \
LIVEKIT_IDENTITY=johnny-botjohnny.voice_pipeline.create_transport_from_env() reads JOHNNY_TRANSPORT
and returns either LocalAudioTransport (default local) or
LiveKitTransport (livekit); the pipeline doesn't change. The LiveKit
SDK (pip install livekit) is only required when this flag is set.
docker run --rm -p 7880:7880 -p 7881:7881 -p 7882:7882/udp \
-e LIVEKIT_KEYS="devkey: secret" \
livekit/livekit-server --dev
JOHNNY_LIVEKIT_SMOKE_URL=ws://localhost:7880 \
JOHNNY_LIVEKIT_SMOKE_TOKEN=<token-minted-with-livekit-cli> \
uv run pytest -k livekit_smoke -vBackend (from backend/):
uv run pytest
uv run ruff check
uv run mypyFrontend (from frontend/):
pnpm typecheck
pnpm lintThis project uses beads (bd) for
issue tracking. Run bd prime for the full workflow reference.