Self-hosted AI transcription and intelligent note-taking platform
Documentation • Quick Start • Screenshots • Docker Hub • Releases
Speakr transforms your audio recordings into organized, searchable, and intelligent notes. Built for privacy-conscious groups and individuals, it runs entirely on your own infrastructure, ensuring your sensitive conversations remain completely private.
Speakr turns a recording into organized, searchable, shareable knowledge. Here is the pipeline:
- Flexible input - record from your microphone, your computer's system or browser-tab audio, or both mixed together; or drag and drop existing files. A per-OS setup guide and a virtual-device picker surface Pulse / PipeWire monitors, BlackHole, VB-Cable, Voicemeeter, and Stereo Mix as inputs.
- Long sessions - in-app recordings stream to the server during capture, so sessions can run for hours and survive a page reload.
- Hands-off intake - a watched "black hole" folder auto-imports and processes any audio dropped into it.
- Bring your own engine - self-hosted WhisperX (recommended; it is what enables the speaker features below), OpenAI, Mistral / Voxtral, or any custom ASR webservice. The right connector is auto-detected from your configuration.
- Speaker diarization - automatic who-said-what labeling (WhisperX, or OpenAI's diarizing models).
- Voice profiles - recognize the same person across different recordings via voice embeddings (requires the WhisperX ASR backend).
- Custom vocabulary and hotwords (most effective with the WhisperX backend) - bias the transcriber toward names, jargon, and acronyms it would otherwise mishear; configurable globally or per tag / folder.
- Synced playback - click any line to jump to that moment, follow-along highlighting during playback, and a chat-style bubble view.
- Language support - automatic language detection plus a quick-pick of 11 common languages.
- Summaries - generated automatically, with prompts you can fully customize per recording, tag, or folder (including reusable prompt variables).
- Event extraction - surface action items and calendar-worthy events from a transcript.
- Per-recording chat - ask questions about a single recording in a floating, dockable panel.
- Inquire Mode - semantic search and natural-language chat across your entire library at once.
- Folders and bulk operations to keep a large library tidy.
- Smart tags that carry their own AI prompt and ASR settings - and stack, so multiple tags layer their instructions.
- Retention policies with auto-deletion and per-recording protection from cleanup.
- Automated export to templated files when a recording finishes.
- Multi-user with Single Sign-On against any OIDC provider (Keycloak, Azure AD, Google, Auth0, Pocket ID).
- Groups with group-scoped tags that auto-share recordings to every member.
- Granular internal sharing (view / edit / reshare) and admin-controlled, secure public links.
- REST API v1 with a Swagger UI, for automation tools (n8n, Zapier, Make) and dashboards.
- Signed webhooks - HMAC-signed, SSRF-guarded, retrying outbound notifications on recording lifecycle events.
- Usage budgets for LLM tokens and transcription minutes, per user.
Speakr is also an installable Progressive Web App - mobile-first, offline-capable, with a phone share-target - and ships light/dark themes, an incognito mode, and a UI translated into seven languages.
Different people use Speakr's collaboration and retention features in different ways:
| Use Case | Setup | What It Does |
|---|---|---|
| Family memories | Create "Family" group with protected tag | Everyone gets access to trips and events automatically, recordings preserved forever |
| Book club discussions | "Book Club" group, tag monthly meetings | All members auto-share discussions, can add personal notes about what resonated |
| Work project group | Share individually with 3 teammates | Temporary collaboration, easy to revoke when project ends |
| Daily group standups | Group tag with 14-day retention | Auto-share with group, auto-cleanup of routine meetings |
| Architecture decisions | Engineering group tag, protected from deletion | Technical discussions automatically shared, preserved permanently as reference |
| Client consultations | Individual share with view-only permission | Controlled external access, clients can't accidentally edit |
| Research interviews | Protected tag + Obsidian export | Preserve recordings indefinitely, transcripts auto-import to note-taking system |
| Legal consultations | Group tag with 7-year retention | Automatic sharing with legal group, compliance-based retention |
| Sales calls | Group tag with 1-year retention | Whole sales group learns from each call, cleanup after sales cycle |
Tags with custom prompts transform raw recordings into exactly what you need:
- Recipe recordings: Record yourself cooking while narrating - tag with "Recipe" to convert messy speech into formatted recipes with ingredient lists and numbered steps
- Lecture notes: Students tag lectures with "Study Notes" to get organized outlines with concepts, examples, and definitions instead of raw transcripts
- Code reviews: "Code Review" tag extracts issues, suggested changes, and action items in technical language developers can use directly
- Meeting summaries: "Action Items" tag ignores discussion and returns just decisions, tasks, and deadlines
Stack multiple tags to layer instructions:
- "Recipe" + "Gluten Free" = Formatted recipe with gluten substitution suggestions
- "Lecture" + "Biology 301" = Study notes format focused on biological terminology
- "Client Meeting" + "Legal Review" = Client requirements plus legal implications highlighted
The order can matter - start with format tags, then add focus tags for best results.
- Obsidian/Logseq: Enable auto-export to write completed transcripts directly to your vault using your custom template - no manual export needed
- Documentation wikis: Map auto-export to your wiki's import folder for seamless transcript publishing
- Content creation: Create SRT subtitle templates from your audio recordings for podcasts or video content
- Project management: Extract action items with custom tag prompts, then auto-export for automated task creation
# Create project directory
mkdir speakr && cd speakr
# Download docker-compose configuration:
wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/docker-compose.example.yml -O docker-compose.yml
# Download the environment template:
wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/env.transcription.example -O .env
# Configure your API keys and launch
nano .env
docker compose up -d
# Access at http://localhost:8899Lightweight image: Use
learnedmachine/speakr:litefor a smaller image (~725MB vs ~4.4GB) that skips PyTorch. All features work normally — only Inquire Mode's semantic search falls back to basic text search.
Required API Keys:
TRANSCRIPTION_API_KEY- For speech-to-text (OpenAI) orASR_BASE_URLfor self-hostedTEXT_MODEL_API_KEY- For summaries, titles, and chat (OpenRouter or OpenAI)
Speakr uses a connector-based architecture that auto-detects your transcription provider:
| Option | Setup | Speaker Diarization | Voice Profiles |
|---|---|---|---|
| OpenAI Transcribe | Just API key | Yes (gpt-4o-transcribe-diarize) |
No |
| WhisperX ASR | GPU container | Yes (best quality) | Yes |
| Mistral Voxtral | Just API key | Yes (built-in) | No |
| VibeVoice ASR | Self-hosted (vLLM) | Yes (built-in) | No |
| Legacy Whisper | Just API key | No | No |
Simplest setup (OpenAI with diarization):
TRANSCRIPTION_API_KEY=sk-your-openai-key
TRANSCRIPTION_MODEL=gpt-4o-transcribe-diarizeBest quality (Self-hosted WhisperX):
ASR_BASE_URL=http://whisperx-asr:9000
ASR_RETURN_SPEAKER_EMBEDDINGS=true # Enable voice profilesRequires WhisperX ASR Service container with GPU.
Mistral Voxtral (cloud diarization):
TRANSCRIPTION_CONNECTOR=mistral
TRANSCRIPTION_API_KEY=your-mistral-key
TRANSCRIPTION_MODEL=voxtral-mini-latestVibeVoice ASR (self-hosted, no cloud dependency):
TRANSCRIPTION_CONNECTOR=vibevoice
TRANSCRIPTION_BASE_URL=http://your-vllm-server:8000
TRANSCRIPTION_MODEL=vibevoiceRequires VibeVoice served via vLLM with GPU.
PyTorch 2.6 Users: If you encounter a "Weights only load failed" error with WhisperX, add
TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=trueto your ASR container. See troubleshooting for details.
View Full Installation Guide →
Complete documentation is available at murtaza-nasir.github.io/speakr
- Getting Started - Quick setup guide
- User Guide - Learn all features
- Admin Guide - Administration and configuration
- Troubleshooting - Common issues and solutions
- FAQ - Frequently asked questions
The first non-patch release in the v0.8 line. Three big user-facing themes: capturing audio is now multi-platform and properly documented, the mobile app is a first-class member of the design system, and the upload modal stops feeling like a desktop card pasted onto a phone. Full release notes: release_notes_v0.9.0.md.
System Audio & Multi-Input Recording
- Per-OS help guide auto-opens for the right platform (macOS BlackHole + Multi-Output Device, Windows "Share system audio", Linux pavucontrol +
pactl module-virtual-sourceone-liner) - New Input devices picker: pick a primary mic AND an optional "Also mix in" secondary device; Web Audio mixes both into one track for capturing both sides of a meeting
- Toggle to disable Chrome's echo cancellation / noise suppression / auto-gain (needed for monitor-source capture)
- Virtual audio device discovery (BlackHole, Loopback, VB-Cable, Voicemeeter, Stereo Mix, Pulse / PipeWire monitors)
- Privacy notes section flags the trade-offs honestly with concrete mitigations
Stats Tab
- New per-recording tab: total length, speaker count, turns, words at the top; per-speaker time / % / turns / words / WPM table; silence row
- Available on desktop right-rail tabs and mobile bottom-nav More overflow
Upload Modal Redesign
- Real modal overlay (not full-screen takeover), progressive disclosure of Options behind a chip summary, inline file preview with duration probe, sticky modal-footer Upload action, last-used tag/folder/language auto-restore with clearable chips, calmer recording buttons
- Mobile: full-width bottom-sheet with drag-to-dismiss
Mobile UI
- Bottom navigation (Summary / Transcript / Chat / More), contextual icons in the chevron row, edge-to-edge content, sticky speaker pills, sticky editor Cancel/Save footer, audio player polish (volume slider rotation fix, popover anchored upward), progress queue as a bottom sheet anchored above the player
Inquire mode "+ New Recording" now opens the upload modal directly via ?upload=1 instead of dumping you on the list.
Design system unification brought 22 modals onto shared .modal-* primitives, .btn + .field everywhere, dark-mode select theming, header consolidation, sidebar redesign, floating dockable chat panel.
Backend & infra: Webhooks Phase 1–3 with HMAC + retry + SSRF guard, server-side recording sessions (hours-long ceiling, resume-on-reload), IDOR fixes for folder / tag ownership, eager-loading and batch query performance work.
Localization refreshed across en, fr, de, es, ru, zh, pt-BR.
Older releases: see the GitHub Releases page for tagged versions, or the release history on the docs site for narrative changelog entries going back to earlier v0.x lines.
View Full Screenshot Gallery →
- Backend: Python/Flask with SQLAlchemy
- Frontend: Vue.js 3 with Tailwind CSS
- AI/ML: OpenAI Whisper, OpenRouter, Ollama support
- Database: SQLite (default) or PostgreSQL
- Deployment: Docker, Docker Compose
- Speaker voice profiles with AI-powered identification (v0.5.9)
- Group workspaces with shared recordings (v0.5.9)
- PWA enhancements with offline support and background sync (v0.5.10)
- Multi-user job queue with fair scheduling (v0.6.0)
- SSO integration with OIDC providers (v0.7.0)
- Token usage tracking and per-user budgets (v0.7.2)
- Connector-based transcription architecture with auto-detection (v0.8.0)
- Comprehensive REST API with Swagger UI documentation (v0.8.0)
- Video retention with in-browser video playback (v0.8.11)
- Parallel uploads with duplicate detection (v0.8.11)
- Fullscreen video mode with live subtitles (v0.8.14)
- Custom vocabulary and transcription hints (v0.8.14)
- Quick language switching for transcription
- Automated workflow triggers
- Plugin system for custom integrations
- End-to-end encryption option
This project is dual-licensed:
-
GNU Affero General Public License v3.0 (AGPLv3)
Speakr is offered under the AGPLv3 as its open-source license. You are free to use, modify, and distribute this software under the terms of the AGPLv3. A key condition of the AGPLv3 is that if you run a modified version on a network server and provide access to it for others, you must also make the source code of your modified version available to those users under the AGPLv3.
- You must create a file named
LICENSE(orCOPYING) in the root of your repository and paste the full text of the GNU AGPLv3 license into it. - Read the full license text carefully to understand your rights and obligations.
- You must create a file named
-
Commercial License
For users or organizations who cannot or do not wish to comply with the terms of the AGPLv3 (for example, if you want to integrate Speakr into a proprietary commercial product or service without being obligated to share your modifications under AGPLv3), a separate commercial license is available.
Please contact speakr maintainers for details on obtaining a commercial license.
You must choose one of these licenses under which to use, modify, or distribute this software. If you are using or distributing the software without a commercial license agreement, you must adhere to the terms of the AGPLv3.
We welcome contributions to Speakr! There are many ways to help:
- Bug Reports & Feature Requests: Open an issue
- Discussions: Share ideas and ask questions
- Documentation: Help improve our docs
- Translations: Contribute translations for internationalization
By submitting a pull request, you agree to our Contributor License Agreement (CLA). This ensures we can maintain our dual-license model (AGPLv3 and Commercial). You retain copyright ownership of your contribution — the CLA simply grants us permission to include it in both the open source and commercial versions of Speakr. Our bot will post a reminder when you open a PR.
See our Contributing Guide for complete details on:
- How the CLA works and why we need it
- Step-by-step contribution process
- Development setup instructions
- Coding standards and best practices





