Skip to content

enjalot/superwords

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Superwords

Real-time audio captioning with speaker diarization.

Architecture

  • Backend: Python/FastAPI with mlx-whisper (transcription) + diart (diarization)
  • Frontend: Next.js 14 + React + Tailwind v4 + shadcn/ui
  • Communication: WebSocket for real-time streaming

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • uv (Python package manager)
  • ffmpeg (for audio processing)

Install ffmpeg on macOS:

brew install ffmpeg

Setup

Backend

cd backend
uv sync
uv run python -m superwords.download_models  # Download ML models
uv run uvicorn superwords.main:app --reload --port 8000

Frontend

cd frontend
npm install
npm run dev

Usage

  1. Start the backend server (port 8000)
  2. Start the frontend dev server (port 3000)
  3. Open http://localhost:3000
  4. Click the microphone button to start recording
  5. Speak into your microphone - transcripts will appear in real-time
  6. Click stop to end recording

Project Structure

superwords/
├── backend/
│   ├── src/superwords/
│   │   ├── main.py           # FastAPI app
│   │   ├── api/websocket.py  # WebSocket endpoint
│   │   └── services/         # Transcription & diarization
│   └── models/               # ML models (gitignored)
├── frontend/
│   ├── app/                  # Next.js pages
│   ├── components/           # React components
│   └── hooks/                # Custom hooks
└── README.md

About

a game engine for using speach to text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors