millet

Millet is a meeting transcription, summarization, and PDF output tool. It's named after the Ottoman millet system — the legal framework of communal autonomy that, in 1493, made it possible for two Sephardic Jewish brothers to establish Istanbul's first printing press, just one year after their expulsion from Spain. Like the millets it's named after, this tool operates under its own rules, on your own machine, within the broader vezir ecosystem.

Formerly known as meetscribe. PyPI distribution name: millet-pipeline (the bare millet slot on PyPI is held by an unrelated dormant 2021 package). See CHANGELOG.md for the rename details and full release history.

Meeting transcription with speaker diarization, AI-generated summaries, and professional PDF output.

Records dual-channel audio (your mic + system audio) from any meeting app and produces diarized transcripts using WhisperX + pyannote-audio. Works fully offline with local models, or optionally use cloud APIs (OpenRouter, Claude Max) for higher-quality summaries. A summarization preset selector picks one of three backends per run: high-quality (Sonnet 4.6), confidential (DeepSeek V4 Pro inside a hardware-attested Tinfoil TEE — the prompts never leave the secure enclave, and the resulting PDF carries a red CONFIDENTIAL watermark on every page), or alternative (Kimi K2.6 via OpenRouter).

Works with any meeting app

Because millet captures system audio at the OS level, it works with every voice/video call application:

Zoom
Google Meet
Microsoft Teams
Slack (huddles and calls)
Discord
Signal (voice and video calls)
Telegram (voice and video calls)
WhatsApp (desktop voice and video calls)
Keet (P2P calls)
Jitsi Meet
Webex
Skype
FaceTime (via browser)
GoTo Meeting
RingCentral
Amazon Chime
BlueJeans

Any app that plays audio through your system speakers will work -- including browser-based meetings and standalone desktop clients.

Features

Dual-channel audio capture -- records your mic (left channel) and remote participants (right channel) simultaneously via PipeWire/PulseAudio + ffmpeg
WhisperX transcription -- fast batched inference with openai/whisper-large-v3-turbo, word-level timestamps via wav2vec2 alignment
Multilingual -- auto-detects language or manually set it; supports English, German, Turkish, French, Spanish, Farsi, and 90+ other languages
Speaker diarization -- pyannote-audio identifies who said what, with automatic YOU/REMOTE labeling from the dual-channel signal
AI meeting summaries -- local LLMs via Ollama, or cloud APIs via OpenRouter / Claude Max / Tinfoil TEE, with automatic fallback between backends (preset-aware: when a preset is explicitly selected the fallback is disabled so the chosen privacy/quality level is honored)
Summarization presets -- --summary-preset high-quality | confidential | alternative resolves to a (backend, model) pair; the confidential preset routes to a Tinfoil TEE-attested DeepSeek V4 Pro so prompts cannot be seen by the model provider or cloud operator
CONFIDENTIAL PDF watermark -- sessions summarized via the tinfoil backend get a red CONFIDENTIAL header + footer on every page (auto-detected from summary.backend, survives relabeling)
Voiceprint speaker recognition -- automatically identifies speakers across meetings using voice embedding profiles
Meeting sync -- push transcripts and summaries to any Git repository on a configurable schedule
Professional PDF output -- summary + full transcript in a clean, page-numbered PDF with full Unicode support (DejaVu Sans) and RTL for Farsi
Multiple output formats -- .txt, .srt, .json, .summary.md, .pdf
Structured YAML frontmatter -- every .summary.md carries a typed schema (action items, decisions, participants, topics, language, duration) plus a matching .frontmatter.json sidecar, ready for indexers and downstream tooling like vezir
GTK3 GUI widget -- small always-on-top window with record/stop, timer, and one-click access to results
CLI -- millet record, millet transcribe, millet run, millet gui, millet label, millet enroll, millet sync, millet ingest, millet devices, millet check
Per-session folders -- each recording gets its own organized directory
Offline-first -- after initial model download, core features work without internet; cloud backends are optional upgrades

Quick start

# Install from PyPI
pip install millet-pipeline

# Set your HuggingFace token (required for speaker diarization)
export HF_TOKEN=hf_your_token_here

# Record a meeting, then auto-transcribe + summarize when you stop
millet run
# Press Ctrl+C when the meeting ends

Requirements

millet runs in two configurations:

Linux desktop (full pipeline: record + transcribe + label + sync)

Linux with PipeWire or PulseAudio (for system-audio capture)
NVIDIA GPU with CUDA (8 GB+ VRAM recommended; CPU mode available but slower)
Python 3.10+, ffmpeg
HuggingFace token (free) for the diarization model
Ollama (optional) for local AI summaries

macOS Apple Silicon (post-capture pipeline: transcribe + label + sync)

M1 / M2 / M3 Mac running macOS
Python 3.10+, ffmpeg
pip install 'millet-pipeline[mlx]' to auto-select MLX Whisper for ASR
HuggingFace token, Ollama as above
Note: millet record / millet run (audio capture) require Linux. On a Mac, feed in audio captured elsewhere via millet transcribe <file.wav>, or use vezir to run a Mac as a server with Linux/Android thin clients providing the recordings.

See REQUIREMENTS.md for full hardware/software details.

Installation

1. System dependencies

# Ubuntu / Pop!_OS / Debian
sudo apt install ffmpeg pulseaudio-utils

# Fedora
sudo dnf install ffmpeg pulseaudio-utils

2. Install millet

# From PyPI (recommended)
pip install millet-pipeline

# Optional: pull the Tinfoil TEE SDK to enable the Confidential preset
pip install 'millet-pipeline[tee]'

# From source
git clone https://github.com/pretyflaco/millet
cd millet
pip install -e .

This creates the meet command in your PATH. The [tee] extra adds the tinfoil Python SDK (≈ 2 MB). Set TINFOIL_API_KEY to use the --summary-preset confidential route; see Summarization presets below.

3. HuggingFace token (for speaker diarization)

Create a free account at https://huggingface.co
Accept the model terms at https://huggingface.co/pyannote/speaker-diarization-community-1
Create a read token at https://huggingface.co/settings/tokens
Set it:

export HF_TOKEN=hf_your_token_here
# Add to ~/.bashrc for persistence:
echo 'export HF_TOKEN=hf_your_token_here' >> ~/.bashrc

4. Ollama (optional, for AI summaries)

Install from https://ollama.com, then pull the default summary model:

ollama pull qwen3.5:9b

5. Verify setup

millet check

Usage

Check audio devices

millet devices

Record a meeting

Start recording before or during your meeting:

millet record

Press Ctrl+C when the meeting ends. A 10-second drain buffer ensures all audio is captured. Recordings are saved to ~/millet-recordings/.

Options:

-o /path -- save recordings elsewhere
--virtual-sink -- create isolated virtual sink (avoids capturing notification sounds)
--mic <source> -- specify mic source (use millet devices to find names)
--monitor <source> -- specify monitor source

Transcribe a recording

millet transcribe ~/millet-recordings/meeting-20260312-140000/meeting-20260312-140000.wav

Options:

-m large-v3-turbo -- Whisper model (default: large-v3-turbo; also: base, medium, large-v2)
-l auto -- language code or auto to auto-detect (default: auto; e.g. en, de, tr, fa)
--asr-backend auto -- ASR backend: auto, whisperx, or mlx. On Apple Silicon with mlx-whisper installed, auto uses MLX Whisper for ASR. MLX only replaces the transcription step; millet still requires WhisperX for audio loading, alignment, and diarization.
--mlx-model <repo-or-path> -- MLX Whisper model path/repo (default: maps large-v3-turbo to mlx-community/whisper-large-v3-turbo)
--device cuda -- cuda or cpu. Default: auto-detected — cpu on Apple Silicon (since macOS has no CUDA), cuda elsewhere.
--torch-device mps -- optional PyTorch device for alignment/diarization; useful with MLX ASR or CPU ASR on Apple Silicon.
--compute-type float16 -- float16 or int8 for lower VRAM (default: float16)
-b 16 -- batch size, reduce if running low on VRAM (default: 16)
--min-speakers 2 / --max-speakers 6 -- hint for number of speakers
--no-diarize -- skip speaker diarization
--no-summarize -- skip AI summary generation
--summary-backend openrouter -- summary backend (ollama, openrouter, claudemax, openai)
--summary-model <model> -- model for summary (default: per-backend)
--skip-alignment -- skip word-level alignment (useful if alignment model is unavailable)
--mixdown mono|dual -- stereo mixdown mode (default: mono). Use dual for headphone setups where mic and system audio don't bleed into each other (see below)

Dual-channel mode for headphone users

If you use headphones, your mic captures only your voice while the system channel captures only the remote participants. In this setup the default mono mixdown creates a ~20× energy imbalance that causes WhisperX to suppress the quieter voice.

Use --mixdown dual to transcribe each channel independently:

millet transcribe --mixdown dual ~/millet-recordings/meeting-20260312-140000/

This skips diarization entirely (channel identity = speaker identity) and labels segments as YOU (mic) or REMOTE (system). Default --mixdown mono behavior is unchanged -- use it when your speakers play into the room and both voices appear on both channels.

Record + transcribe in one shot

millet run

Records until Ctrl+C, then automatically transcribes, generates a summary, and produces a PDF. Takes all options from both record and transcribe (including --mixdown dual).

Launch the GUI widget

millet gui

A small always-on-top window with:

Record / Stop button
Live timer and file size
Status indicator (Recording, Flushing, Transcribing, Summarizing, Done)
"Open PDF" and "Open Folder" buttons after completion

When 2 or more speakers are detected, a speaker labeling dialog appears before the results are saved. Each speaker is shown with their channel and a sample line of text. If voice profiles exist, confident matches are shown automatically. Enter a real name or leave blank to keep the auto-assigned label (YOU, REMOTE_1, etc.).

If meeting sync is configured and the recording matches a scheduled meeting, a sync confirmation prompt appears with Push / Skip buttons.

Label speakers after the fact

millet label ~/millet-recordings/meeting-20260313-214133

For each speaker in the recording, millet label:

Shows a table of all speakers (label, channel, segment count, sample text)
Plays a short audio clip from that speaker's channel (requires ffplay)
Prompts you to enter a real name (press Enter to keep the existing label)
Regenerates all outputs (.txt, .srt, .json, .summary.md, .pdf) with the new names

With --auto, voice profiles are used to automatically identify known speakers. Confident matches are applied without prompting; only unrecognized speakers get the interactive prompt:

millet label --auto ~/millet-recordings/meeting-20260313-214133

Options:

--auto -- auto-label using voice profiles (see Voiceprint speaker recognition)
--no-audio -- skip audio playback, just show text samples
--no-summary -- use find-and-replace instead of re-running Ollama
--summary-backend / --summary-model -- override summary backend and model for regeneration

Output

Each recording gets its own session directory:

~/millet-recordings/meeting-20260312-140000/
    meeting-20260312-140000.wav                 # Stereo audio (16kHz)
    meeting-20260312-140000.session.json        # Recording metadata
    meeting-20260312-140000.ffmpeg.log          # ffmpeg capture log
    meeting-20260312-140000.txt                 # Plain text transcript
    meeting-20260312-140000.srt                 # Subtitle format
    meeting-20260312-140000.json                # Full detail (word-level timestamps)
    meeting-20260312-140000.summary.md          # AI meeting summary with YAML frontmatter
    meeting-20260312-140000.summary.meta.json   # Summary backend/model + timing metadata
    meeting-20260312-140000.frontmatter.json    # Structured frontmatter (schema_version 1)
    meeting-20260312-140000.pdf                 # Professional PDF (summary + transcript)

Example .txt output:

[00:00:12 --> 00:00:18] YOU: So the main issue we're seeing is with the API rate limiting.
[00:00:19 --> 00:00:25] REMOTE_1: Right, I think we should implement exponential backoff.
[00:00:26 --> 00:00:31] YOU: Agreed. Can you also look at caching the responses?

Structured frontmatter

Every .summary.md ships with a typed YAML frontmatter block plus a matching .frontmatter.json sidecar. The schema is intentionally small in v1 so downstream consumers can rely on it:

---
schema_version: 1
type: meeting
title: Q2 Pricing Discussion
date: "2026-03-17T14:00:00+00:00"
duration: PT42M17S
language: en
participants:
  - name: YOU
    role: null
    channel: mic
  - name: Alice
    role: null
    channel: system
topics:
  - pricing
  - onboarding
action_items:
  - assignee: Alice
    task: Send pricing doc
    due: Friday
    status: open
decisions:
  - text: Run pricing experiment at $99/mo
    topic: pricing
source:
  session_id: meeting-20260312-140000
  audio_sha256: null
---
## Meeting Overview
...

## Key Topics Discussed
...

schema_version: 1 is what every consumer should pin against. The JSON sidecar contains the exact same dict for tools that don't want to parse YAML. [vezir](https://github.com/pretyflaco/vezir) 0.2.0+ reads these files directly to build a queryable index over your meetings.

Backfilling existing sessions

Sessions recorded before millet 0.7.0 don't carry frontmatter. Re-extract it for one or more sessions with:

# Re-run the LLM to produce frontmatter; idempotent (skips sessions
# whose .summary.meta.json already records data_extracted=true).
millet ingest ~/millet-recordings/meeting-2026*

# Force re-extraction even when frontmatter is already present:
millet ingest --force ~/millet-recordings/meeting-20260312-140000

# Preview without invoking the LLM:
millet ingest --dry-run ~/millet-recordings/meeting-2026*

millet ingest accepts the same --summary-backend / --summary-model / --ollama-singlepass flags as millet transcribe and regenerates the PDF by default (--no-pdf to skip).

AI summary

millet generates a structured meeting summary with:

Overview
Key topics discussed
Action items (with owners when mentioned)
Decisions made
Open questions / follow-ups

Supported models

Model	Size	Speed	Notes
`qwen3.5:9b`	6.6 GB	~18-35s	Default -- best balance of quality and speed
`gemma3:12b`	8.1 GB	~15s	Fastest
`qwen3:14b`	9.3 GB	~39s	Good quality
`glm-4.7-flash`	19 GB	~37s	Must use thinking-off mode (handled automatically)

Change the model:

millet run --summary-model gemma3:12b

Disable summaries:

millet run --no-summarize

Summary backends

millet supports five backends with automatic fallback:

Backend	Setup	Cost	Quality	Privacy
`ollama` (default)	`ollama serve` + `ollama pull qwen3.5:9b`	Free	Good	Fully local
`openrouter`	Set `OPENROUTER_API_KEY`	Pay-per-use	Excellent	Cloud (model-provider-visible)
`claudemax`	Run claude-max-api-proxy on localhost:3457	Claude Max subscription	Excellent	Cloud (Anthropic-visible)
`tinfoil`	`pip install 'millet-pipeline[tee]'`, set `TINFOIL_API_KEY` (or drop a key file at `~/models/tinfoil/tinfoil.txt`)	~$0.009/meeting	Excellent (DeepSeek V4 Pro)	Hardware-attested TEE — prompts not visible to provider/operator
`openai`	Set `MEETSCRIBE_OPENAI_BASE_URL`	Varies	Varies	Depends on endpoint

The openai backend works with any OpenAI-compatible API — Lemonade, LiteLLM, vLLM, text-generation-webui, LocalAI, or any self-hosted endpoint.

The tinfoil backend runs inference inside a hardware-attested TEE (AMD SEV-SNP or Intel TDX, depending on the model). The model provider can't see the prompts, the cloud operator can't see the prompts, and the integrity is checked against an attestation report on every request. ~$0.009 per meeting; latency ~66 s for a 30-min recording on DeepSeek V4 Pro.

# Use OpenRouter
millet run --summary-backend openrouter --summary-model anthropic/claude-sonnet-4.6

# Use any OpenAI-compatible endpoint
export MEETSCRIBE_SUMMARY_BACKEND=openai
export MEETSCRIBE_OPENAI_BASE_URL=http://localhost:8000/v1
export MEETSCRIBE_SUMMARY_MODEL=your-model-name
# Optional: export MEETSCRIBE_OPENAI_API_KEY=your-key

# Or set via environment variables
export MEETSCRIBE_SUMMARY_BACKEND=openrouter
export MEETSCRIBE_SUMMARY_MODEL=anthropic/claude-sonnet-4.6

If the configured backend is unavailable, millet automatically tries the next one in the fallback chain: claudemax → tinfoil → openrouter → ollama. The openai backend is opt-in only and never participates in the fallback chain.

When a preset is explicitly selected (see Summarization presets below), this fallback is disabled for that run — the chosen preset's backend either succeeds or the whole summarization step fails loudly with a non-zero exit. This protects the privacy/quality promise of the confidential preset (a silent tinfoil → claudemax fallback would defeat the entire point of choosing TEE-attested inference).

Summarization presets

A preset is a friendly name that resolves to a concrete (backend, model) pair. Set it via --summary-preset on transcribe, run, label, gui, or ingest, or via the MEETSCRIBE_SUMMARY_PRESET env var.

Preset	Backend	Model	Use case
`high-quality`	`claudemax`	`claude-sonnet-4-6`	Default for users with a Claude Max subscription; highest summary quality
`confidential`	`tinfoil`	`deepseek-v4-pro`	Meetings where prompts must not be retained or trained on; hardware-attested TEE
`alternative`	`openrouter`	`moonshotai/kimi-k2.6`	Cheapest cloud option (~$0.017/meeting); useful when claudemax credentials are unavailable

# Quick check of which preset is in effect
millet transcribe ~/millet-recordings/today/today.wav --summary-preset confidential

# Or set per-session via env
export MEETSCRIBE_SUMMARY_PRESET=high-quality
millet run

When a preset is set, --summary-backend and --summary-model overrides are honored within that preset (e.g. --summary-preset confidential --summary-model deepseek-v3 swaps the model but keeps the TEE backend).

Two-pass local summarization

When the ollama backend is selected (the default), millet runs two LLM calls instead of one:

Pass 1 (extraction) — pulls topics, actions, decisions, and open questions out of the transcript as plain numbered lists, using a context window sized to the full transcript.
Pass 2 (formatting) — takes the much smaller extracted data and organizes it into the canonical Markdown structure with a fixed 8K context window.

This dramatically improves format compliance and reduces hallucinations on 20B-class local models (gpt-oss:20b, qwen3.6:27b) compared to a single-pass call, at the cost of one additional LLM call (~30–90s extra). Cloud backends (claudemax, openrouter, openai) remain single-pass — they already produce well-structured output in one shot.

To opt out and use the previous single-pass behavior:

millet run --ollama-singlepass
# Or via environment:
export MEETSCRIBE_OLLAMA_SINGLEPASS=1

The .summary.meta.json sidecar records per-pass timings (pass1_seconds, pass2_seconds, pass1_chars) when two-pass was used.

See docs/local-model-evaluation.md for the full evaluation that motivated this design, including known failure modes of local 20B-class models.

Customizing the prompt

The summarization prompt lives in meet/prompts/summarize_system.md. Edit it to change the summary format, add domain-specific instructions, or tune for your preferred model. No Python changes needed.

Voiceprint speaker recognition

millet can automatically identify speakers across meetings using voice embeddings. After you label speakers in one meeting, their voice profiles are stored and matched against future recordings.

# Build profiles from already-labeled sessions
millet enroll ~/millet-recordings/meeting-20260330-*

# Auto-label speakers in future meetings using voice profiles
millet label --auto ~/millet-recordings/meeting-20260401-093000

Profiles are stored in ~/.config/meet/speaker_profiles.json and improve with each labeled session (running average of embeddings).

Meeting sync

Push meeting artifacts to a Git repository on a configurable schedule.

# Create an example config
millet sync --init-config
# Edit ~/.config/meet/sync_config.json with your repo URL and schedule

# Push a session manually
millet sync ~/millet-recordings/meeting-20260331-110038_STANDUP

# View configured schedule
millet sync --list-schedule

When the GUI detects a matching scheduled meeting, it prompts for confirmation before syncing. Sessions that don't match the schedule are skipped. The CLI uses --force to sync unmatched sessions.

You can also configure a team_members list and min_team_members threshold in sync_config.json to require that a minimum number of recognized speakers are present before offering to sync.

Multilingual support

millet auto-detects the spoken language by default (Whisper large-v3-turbo supports 99 languages). You can also set it explicitly:

millet run --language de       # German
millet run --language tr       # Turkish
millet run --language fr       # French
millet run --language es       # Spanish
millet run --language fa       # Farsi (Persian)
millet run --language auto     # Auto-detect (default)

How it works

Transcription: The same Whisper model handles all languages -- no extra download or VRAM cost. When set to auto, the detected language is used for alignment and all downstream steps.
Speaker diarization: Completely language-agnostic (based on voice characteristics, not speech content).
AI summary: When a non-English language is detected, the summary prompt instructs the LLM to write the summary in the same language as the transcript.
PDF output: Uses DejaVu Sans for full Unicode coverage (Latin, Cyrillic, Greek, Turkish special characters, etc.). Farsi uses Noto Naskh Arabic with RTL text reshaping.

Tested languages

Language	Code	Alignment model	PDF font	Notes
English	`en`	wav2vec2 (torchaudio)	DejaVu Sans
German	`de`	VoxPopuli (torchaudio)	DejaVu Sans
French	`fr`	VoxPopuli (torchaudio)	DejaVu Sans
Spanish	`es`	VoxPopuli (torchaudio)	DejaVu Sans
Turkish	`tr`	wav2vec2 (HuggingFace)	DejaVu Sans	~1.2 GB alignment model download
Farsi	`fa`	wav2vec2 (HuggingFace)	Noto Naskh Arabic	~1.2 GB alignment model download, RTL

Farsi RTL requirements

Farsi uses right-to-left text. For proper PDF rendering, install the optional RTL dependencies:

pip install arabic-reshaper python-bidi
# Or with the optional extra:
pip install "millet-pipeline[rtl]"

Without these libraries, Farsi text will appear in the PDF but glyphs may not be joined correctly and reading order may be wrong.

Virtual sink mode

By default, millet record captures all system audio (including notification sounds, music, etc.). For cleaner recordings, use --virtual-sink:

millet record --virtual-sink

This creates an isolated audio sink. Route your meeting app's audio to it:

Open pavucontrol (PulseAudio Volume Control)
Go to the "Playback" tab
Find your browser or meeting app
Change its output to "Meet-Capture"

You'll still hear the meeting through your normal speakers via automatic loopback.

VRAM usage

With an NVIDIA GPU (12 GB VRAM):

Model	Transcription	+ Diarization	Recommended batch_size
large-v3-turbo	~4 GB	~7 GB total	16
medium	~3 GB	~6 GB total	16
base	~1 GB	~4 GB total	16

If you hit OOM errors:

Reduce --batch-size to 4 or 8
Use --compute-type int8
Use a smaller model (--model medium or --model base)
Use --device cpu as a last resort

How it works

[Meeting App] --> [PipeWire/PulseAudio] --> [ffmpeg dual-channel capture] --> meeting.wav
                                                                                  |
                  [WhisperX: faster-whisper + wav2vec2 alignment + pyannote diarization]
                                                                                  |
                                      [Ollama LLM summary]     [Diarized transcript]
                                              |                         |
                                        .summary.md          .txt / .srt / .json
                                              |                         |
                                              +--------> .pdf <---------+

Capture: Records your mic (left channel) and system audio (right channel) simultaneously into a single stereo WAV file at 16 kHz.

Transcribe: Runs the WhisperX pipeline -- batched Whisper transcription, wav2vec2 forced alignment for word-level timestamps, and pyannote speaker diarization. Dual-channel energy analysis maps speakers to YOU or REMOTE.

Summarize: Sends the transcript to a local Ollama model that extracts a structured summary.

PDF: Combines the summary and full transcript into a professional page-numbered PDF document.

CUDA NVRTC note

The pyannote diarization model requires CUDA NVRTC for JIT compilation. If your CUDA driver version doesn't match the installed libnvrtc-builtins version, millet automatically creates a compatibility symlink. This happens transparently on first use.

If you still see NVRTC errors:

export LD_LIBRARY_PATH=$HOME/.local/lib/cuda:$LD_LIBRARY_PATH

Limitations

Overlapping speech is not handled well (Whisper limitation)
Speaker labels default to role-based (YOU, REMOTE_1, REMOTE_2) — use millet label or the GUI dialog to assign real names
Diarization accuracy varies with audio quality and number of speakers
Audio capture (millet record, millet run) requires Linux with PulseAudio or PipeWire. Transcription, labeling, summarization, and sync work on both Linux (CUDA) and macOS Apple Silicon (MLX Whisper + MPS) as of v0.6.0.
Windows is not supported.
Local 20B-class summary models (e.g. gpt-oss:20b) can hallucinate on transcripts dominated by very short low-information utterances ("yes", "okay") and may exceed the default 600s timeout on very large (>100 KB) non-English transcripts. For these cases configure a cloud backend (claudemax / openrouter) — the fallback chain takes over automatically. See docs/local-model-evaluation.md.

FAQ

Is there a GUI? Yes — run millet gui for a small always-on-top GTK3 widget with Record/Stop, live timer, status indicator, and one-click access to the resulting PDF and session folder. See Launch the GUI widget for details.

Does it work on Windows / macOS? System-audio recording requires Linux (PulseAudio / PipeWire). The post-capture pipeline (millet transcribe, millet label, millet sync, etc.) works on macOS Apple Silicon as of v0.6.0 — install with pip install 'millet-pipeline[mlx]'. Windows is not supported.

Can I run a Mac as a transcription server? Yes — see vezir, the team-scale wrapper around millet. A Mac can act as the GPU server with Linux laptops or the Android client providing the audio.

Can I use it without a GPU? Yes, with --device cpu, but transcription will be 5–20× slower depending on the Whisper model. See VRAM usage.

Contributing

git clone https://github.com/pretyflaco/millet
cd millet
pip install -e .[dev]
/usr/bin/python3 -m pytest tests/

Pull requests welcome. Please run the test suite before submitting.

Changelog

See CHANGELOG.md for release history.

License

GPL-3.0

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/workflows		.github/workflows
docs		docs
millet		millet
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
REQUIREMENTS.md		REQUIREMENTS.md
pyproject.toml		pyproject.toml
screenshot.png		screenshot.png

Folders and files

Latest commit

History

Repository files navigation

millet

Works with any meeting app

Features

Quick start

Requirements

Installation

1. System dependencies

2. Install millet

3. HuggingFace token (for speaker diarization)

4. Ollama (optional, for AI summaries)

5. Verify setup

Usage

Check audio devices

Record a meeting

Transcribe a recording

Dual-channel mode for headphone users

Record + transcribe in one shot

Launch the GUI widget

Label speakers after the fact

Output

Structured frontmatter

Backfilling existing sessions

AI summary

Supported models

Summary backends

Summarization presets

Two-pass local summarization

Customizing the prompt

Voiceprint speaker recognition

Meeting sync

Multilingual support

How it works

Tested languages

Farsi RTL requirements

Virtual sink mode

VRAM usage

How it works

CUDA NVRTC note

Limitations

FAQ

Contributing

Changelog

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 19

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages