CapsWriter-Offline (v2.5)

Hold CapsLock, speak, release, and the text appears.

CapsWriter-Offline is a fully offline speech input tool built primarily for Windows.

🚀 What’s New

v2.5-alpha

Initial support for Qwen3-ASR-1.7B
- Works for both microphone dictation and file transcription in this fork
- When the model does not return real timestamps, the app falls back to the existing approximate timeline path for subtitle and JSON generation
- Decoder-side Vulkan acceleration is enabled by default and typically needs about 1.6 GB of VRAM
- If your GPU drops memory clocks while idle, cold-start latency can rise to around 300 ms
- Locking memory clocks with nvidia-smi -lmc 9000 can reduce short-clip latency to around 100 ms on hardware such as RTX 5050

v2.4

Improved Fun-ASR-Nano-GGUF support with DirectML encoder acceleration and better FP16 defaults
Server-side Fun-ASR-Nano now uses its own hot-server.txt hotword context file
Spoken punctuation like “comma”, “period”, and “new line” can be converted automatically
Added decoder temperature handling to avoid edge-case repetition loops
Improved server-side alphabet spelling merge behavior

v2.3

Added Fun-ASR-Nano-GGUF support
Refactored large-file transcription to async streaming
Improved Chinese/English spacing cleanup
Improved server cleanup after abnormal disconnects

✨ Core Features

Speech input: hold CapsLock or mouse side button X2, speak, release, and insert text immediately
File transcription: drag audio/video files onto the client and generate .srt, .txt, and .json
ITN formatting: convert spoken number patterns into clean written forms
Server hotword context: store domain-specific terms in hot-server.txt to help Fun-ASR-Nano recognize context
Hotword replacement: use hot.txt for phoneme-based fuzzy matching and forced replacement
Rule replacement: use hot-rule.txt for regex-based or direct text replacement
Rectify history: keep correction history in hot-rectify.txt to help LLM polishing
LLM roles: route text to roles such as assistant or translate when the recognized text starts with that role name
Tray menu: manage hotwords, copy results, or clear LLM memory from the tray icon
Client/server split: run the model on one machine and the lightweight client on another if needed
Diary archive: save recognized sentences by date
Audio archive: save recorded audio locally for privacy and traceability

The project is designed around four ideas: offline, fast, accurate, and highly configurable. The goal is a smooth voice-input workflow that still works without cloud access, installation-heavy deployment, or network connectivity.

LLM roles can use local models through Ollama or remote APIs through providers such as OpenAI-compatible services.

💻 Platform Support

The project is mainly targeted at Windows 10/11 (64-bit).

Linux: not officially tested or packaged
macOS: currently unsupported because low-level keyboard-hook support is limited and system permissions are restrictive

🎬 Quick Start

Install the VC++ runtime.
Download the app from Latest Release.
Download the model package from Models Release and extract it into the matching subfolder under models/.
Launch start_server.exe.
Launch start_client.exe.
Hold CapsLock or mouse side button X2 and start speaking.

🎤 Model Options

Select the speech model in config.toml through server.model_type:

qwen3_asr: built-in punctuation, acceptable CPU speed, very fast on discrete GPUs, excellent accuracy
fun_asr_nano: built-in punctuation, fast on CPU, very fast on discrete GPUs, top-tier accuracy
sensevoice: built-in punctuation, extremely fast on CPU, strong multilingual support
paraformer: external punctuation model, extremely fast on CPU, high accuracy

⚙️ Configuration

All runtime settings live in the root config.toml.

Edit [[client.shortcuts]] to change keyboard or mouse triggers
Set hold_mode = false for press-once / press-again recording
Toggle llm_enabled to enable or disable LLM post-processing
Change server.model_type to switch ASR backends
Tune model-specific acceleration flags under [models.*]

🛠️ FAQ

Q: Why does nothing happen when I press the key?
A: Make sure the start_client.exe console process is still running. If you want to type into an elevated application, run the client with administrator privileges too.

Q: Why is there no recognition output?
A: Check the recorded audio inside the dated year/month/assets folder. Make sure the microphone is actually recording and that Windows microphone permissions are enabled.

Q: Can I use GPU acceleration?
A: Fun-ASR-Nano and Qwen3-ASR can use GPU acceleration. If your integrated GPU performs worse than CPU, disable the model-specific dml_enable or vulkan_enable flags in config.toml.

Q: File transcription is too slow on low-end hardware. What can I do?
A: Try these in order:

Use sensevoice or paraformer if you need the fastest CPU path.
Disable dml_enable or vulkan_enable for Qwen3-ASR / Fun-ASR-Nano if GPU acceleration hurts more than it helps.
Lower model thread counts where supported.
Lock GPU memory clocks with nvidia-smi -lmc 9000 if you are optimizing for short clips on NVIDIA hardware.

Q: Fun-ASR-Nano quality is unstable on my integrated GPU. Why?
A: Some integrated GPUs have poor behavior with Vulkan FP16 accumulation in llama.cpp. If you see degraded output, disable vulkan_enable for that model and run the decoder on CPU.

Q: How do hotwords work?
A: hot-server.txt is used for server-side Fun-ASR-Nano context enhancement. hot.txt and hot-rule.txt are client-side replacement sources. hot-rectify.txt stores correction history used by the LLM pipeline.

Q: How do I use LLM roles?
A: Start your spoken command with the role name. For example, if you have a role named translate, saying translate, the weather is great today sends the recognized text through the translation role instead of direct output.

Q: How do I choose the LLM model behind a role?
A: Role defaults and overrides are defined through config.toml plus the LLM/ role entry modules. You can point roles to local Ollama models or remote API providers.

Q: Can an LLM role read selected text on screen?
A: Yes. If a role enables enable_read_selection, the app can capture the current selection with Ctrl+C and pass it into the LLM context before processing your voice command.

Q: How do I hide the console window?
A: Use the tray menu to hide it.

Q: How do I start it automatically on boot?
A: Run Win+R, open shell:startup, and place shortcuts for the client and server there.

❤️ Credits

This project builds on several excellent open-source projects:

Special thanks to modern AI coding assistants and to the users who supported the project.

Name		Name	Last commit message	Last commit date
Latest commit History 320 Commits
LLM		LLM
assets		assets
docs		docs
models		models
util		util
.gitattributes		.gitattributes
.gitignore		.gitignore
03-Inference.py		03-Inference.py
CLAUDE.md		CLAUDE.md
app_config.py		app_config.py
block_mouse_forward.py		block_mouse_forward.py
build-client.spec		build-client.spec
build.spec		build.spec
build_hook.py		build_hook.py
config.toml		config.toml
config_client.py		config_client.py
config_server.py		config_server.py
core_client.py		core_client.py
core_server.py		core_server.py
hot-rectify.txt		hot-rectify.txt
hot-rule.txt		hot-rule.txt
hot-server.txt		hot-server.txt
hot.txt		hot.txt
readme.md		readme.md
requirements-client.txt		requirements-client.txt
requirements-server.txt		requirements-server.txt
start_client.py		start_client.py
start_server.py		start_server.py
zip_release.py		zip_release.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CapsWriter-Offline (v2.5)

🚀 What’s New

v2.5-alpha

v2.4

v2.3

✨ Core Features

💻 Platform Support

🎬 Quick Start

🎤 Model Options

⚙️ Configuration

🛠️ FAQ

❤️ Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CapsWriter-Offline (v2.5)

🚀 What’s New

v2.5-alpha

v2.4

v2.3

✨ Core Features

💻 Platform Support

🎬 Quick Start

🎤 Model Options

⚙️ Configuration

🛠️ FAQ

❤️ Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages