STT — Speech to Text

Double-tap a key. Talk. Tap once. Your words appear.

System-wide voice dictation for macOS, Windows, and Linux. No cloud lock-in — uses Groq's free Whisper API (or fully offline MLX Whisper on Apple Silicon).

How it feels

You're typing in any app. Hands on the keyboard.

  Step 1 — Double-tap ⌘ (or Ctrl on Win/Linux)
  ┌──────────────────────────────────────────────────────────────┐
  │                                                              │
  │                       Listening...                           │
  │                                                              │
  └──▬▬▬▬▬▬▬▬▬▬▬▬▬ soft green glow at the bottom ▬▬▬▬▬▬▬▬▬▬▬▬──┘

  Step 2 — Speak. Live text streams in as you talk.
  ┌──────────────────────────────────────────────────────────────┐
  │                                                              │
  │       "let's ship this feature by friday"  ...               │
  │                                                              │
  └──▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬──┘

  Step 3 — Single-tap ⌘ (or Ctrl). Final text pastes at your cursor.
  ┌──────────────────────────────────────────────────────────────┐
  │                                                              │
  │     "Let's ship this feature by Friday."  ✓                  │
  │                                                              │
  └──────────────────── green flash ─────────────────────────────┘

The flow

sequenceDiagram
    participant U as You
    participant K as Keyboard hook
    participant M as Microphone
    participant G as Groq Whisper
    participant A as Your app

    U->>K: Double-tap ⌘ / Ctrl
    K->>M: Start capture
    Note over M: Live preview every 2s
    M->>G: Stream audio chunks
    G-->>K: "let's ship..."
    U->>K: Single tap ⌘ / Ctrl
    K->>M: Stop
    M->>G: Final audio
    G-->>K: "Let's ship this feature by Friday."
    K->>A: Paste at cursor (⌘V / Ctrl+V)

Install

Pick your OS:

🍎 macOS

Easiest — download .app:

Grab STT-macOS.zip from releases
Unzip → drag STT.app to /Applications
Right-click → Open (bypasses unsigned warning)
Paste your Groq API key
Allow Mic + Accessibility in System Settings

From source:

git clone https://github.com/murataslan1/stt
cd stt/macos
./install.sh
python3 stt.py

🪟 Windows

From source:

git clone https://github.com/murataslan1/stt
cd stt\windows
pip install -r requirements.txt
python stt_windows.py

Build standalone .exe:

build.bat

→ dist\STT.exe is self-contained. Double-click to run.

🐧 Linux

git clone https://github.com/murataslan1/stt
cd stt/linux
./install.sh
python3 stt_linux.py

Handles apt / dnf / pacman automatically.

X11 works out of the box. Wayland: see caveats below.

Usage

	macOS	Windows	Linux
Start recording	Double-tap `⌘`	Double-tap `Ctrl`	Double-tap `Ctrl`
Stop & paste	Single-tap `⌘`	Single-tap `Ctrl`	Single-tap `Ctrl`

That's it. No menu, no clicks. Just the key.

The menu bar (macOS) lets you toggle between Groq (fast, cloud) and Local (offline MLX) mode or change your API key.

API key

First launch shows this:

┌─────────────────────────────────────────┐
│  Welcome to STT                         │
│                                         │
│  Enter your Groq API key                │
│  (free at console.groq.com/keys)        │
│                                         │
│  ┌───────────────────────────────────┐  │
│  │ gsk_...                           │  │
│  └───────────────────────────────────┘  │
│                                         │
│           [ Save & Start ]              │
└─────────────────────────────────────────┘

Key is stored at ~/.config/stt/settings.json (chmod 600 recommended). You can also set GROQ_API_KEY as an env var.

Get a free key: https://console.groq.com/keys (takes 30 seconds)

Skip the key? macOS falls back to local MLX Whisper (downloads ~1.5GB once, runs fully offline).

Architecture

flowchart LR
    A[Double-tap<br/>detection] --> B{Recording?}
    B -->|no| C[Start<br/>InputStream]
    B -->|yes| D[Stop &<br/>transcribe]
    C --> E[Bottom-bar<br/>overlay]
    C --> F[Live loop<br/>every 2s]
    F -->|Groq API| G[Live text]
    G --> E
    D -->|Groq API| H[Final text]
    H --> I[Clipboard]
    I --> J[Synthesize<br/>⌘V / Ctrl+V]
    J --> K[Paste at<br/>cursor ✓]

Key files per platform:

Piece	macOS	Windows	Linux
Entry	`macos/stt.py`	`windows/stt_windows.py`	`linux/stt_linux.py`
Key hook	`NSEvent.addGlobalMonitor`	`pynput`	`pynput` (X11)
Overlay	AppKit `NSWindow`	Tkinter `Canvas`	Tkinter `Canvas`
Clipboard	`NSPasteboard`	`pyperclip`	`xclip` / `wl-copy`
Paste key	`Quartz.CGEvent` ⌘V	`pynput` Ctrl+V	`xdotool` / `wtype`

Tweaking

Open the entry file for your OS — config is at the top:

DOUBLE_TAP_WINDOW = 0.4   # max seconds between taps
LIVE_INTERVAL = 2.0       # live transcription refresh
PREVIEW_LINGER = 3.0      # how long final text stays on screen
GROQ_MODEL = "whisper-large-v3-turbo"

Different hotkey? Change the key check in handle_event (macOS) or _key_listener (Win/Linux).

Different STT engine? Replace transcribe_audio() — takes a numpy float32 array, returns a string.

Troubleshooting

macOS: nothing happens when I double-tap ⌘

Grant Accessibility permission: System Settings → Privacy & Security → Accessibility → toggle on for STT.app (or Terminal if running from source).

Then restart the app.

macOS: "STT.app is damaged and can't be opened"

Gatekeeper is blocking the unsigned app. Run once:

xattr -dr com.apple.quarantine /Applications/STT.app

Linux Wayland: key hook doesn't fire

pynput can't grab global keys on most Wayland compositors by design. Options:

Switch to X11 session at login (GDM/SDDM session picker).
Use a compositor shortcut → bind Super+Space (or whatever) to kill -USR1 $(pgrep -f stt_linux.py) and patch the script to toggle on that signal.
Run under Xorg — most distros still ship an X session.

The paste side works fine on Wayland (uses wl-copy + wtype / ydotool).

Live transcription is blank / errors in logs

Probably an invalid Groq key. Menu bar → Set Groq API Key... (macOS), or edit ~/.config/stt/settings.json directly.

Audio is too quiet / not picking up

Check default input device: macOS System Settings → Sound → Input; Windows Sound panel; Linux pavucontrol. sounddevice uses the system default.

Privacy

Audio streams to Groq only while you're recording. Nothing persists server-side (per Groq's privacy policy).
Local MLX mode (macOS) keeps audio on-device. Zero network.
Only your API key is stored on disk (~/.config/stt/settings.json).

Project layout

stt/
├── macos/
│   ├── stt.py                    Main app (AppKit + MLX fallback)
│   ├── STT.app/                  Double-clickable wrapper
│   ├── install.sh
│   └── requirements.txt
├── windows/
│   ├── stt_windows.py            Main app (Tkinter + pynput)
│   ├── build.bat                 PyInstaller → standalone .exe
│   └── requirements.txt
└── linux/
    ├── stt_linux.py              Main app (Tkinter + pynput)
    ├── install.sh                apt / dnf / pacman auto-detect
    ├── stt.desktop               Menu entry
    └── requirements.txt

License

MIT — do whatever, attribution appreciated.

Contributions welcome. If you build a Windows/Linux binary, open a PR or attach it to the release.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
linux		linux
macos		macos
windows		windows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STT — Speech to Text

How it feels

The flow

Install

🍎 macOS

🪟 Windows

🐧 Linux

Usage

API key

Architecture

Tweaking

Troubleshooting

Privacy

Project layout

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

STT — Speech to Text

How it feels

The flow

Install

🍎 macOS

🪟 Windows

🐧 Linux

Usage

API key

Architecture

Tweaking

Troubleshooting

Privacy

Project layout

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages