Real-time audio captioning with speaker diarization.
- Backend: Python/FastAPI with mlx-whisper (transcription) + diart (diarization)
- Frontend: Next.js 14 + React + Tailwind v4 + shadcn/ui
- Communication: WebSocket for real-time streaming
- Python 3.11+
- Node.js 18+
- uv (Python package manager)
- ffmpeg (for audio processing)
Install ffmpeg on macOS:
brew install ffmpegcd backend
uv sync
uv run python -m superwords.download_models # Download ML models
uv run uvicorn superwords.main:app --reload --port 8000cd frontend
npm install
npm run dev- Start the backend server (port 8000)
- Start the frontend dev server (port 3000)
- Open http://localhost:3000
- Click the microphone button to start recording
- Speak into your microphone - transcripts will appear in real-time
- Click stop to end recording
superwords/
├── backend/
│ ├── src/superwords/
│ │ ├── main.py # FastAPI app
│ │ ├── api/websocket.py # WebSocket endpoint
│ │ └── services/ # Transcription & diarization
│ └── models/ # ML models (gitignored)
├── frontend/
│ ├── app/ # Next.js pages
│ ├── components/ # React components
│ └── hooks/ # Custom hooks
└── README.md