feat: add post_transcription_hook for user-defined text transforms#167
Conversation
Introduce `post_transcription_hook` config option. When set to a shell command, every transcription is piped through it after preprocessing and before paste. Stdin receives the transcription; non-empty stdout replaces it. Empty stdout leaves text unchanged, enabling fire-and- forget observers. Subsumes the narrower dictation-tag use case while opening the same hook up to arbitrary transforms (tag wrapping, logging, profanity filters, notifications, per-utterance analytics) without further config surface or upstream patches. Examples (user config): # Wrap in <dictation> tag for LLM consumers: "post_transcription_hook": "sed 's|.*|<dictation>&</dictation>|'" # Archive to file, leave text unchanged: "post_transcription_hook": "tee -a ~/.local/share/hyprwhspr/log.txt >/dev/null" # Count filler words, notify, leave text unchanged: "post_transcription_hook": "~/.local/bin/filler-coach" Metadata exposed via env vars HYPRWHSPR_MODEL and HYPRWHSPR_BACKEND. Hook runs under a 5 s timeout; on timeout, non-zero exit, or any subprocess error the original text is preserved — a broken hook must never silently eat a dictation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Hey, thanks. This is so cool. I will review this shortly. Absolutely no obligation, but... Would you be willing to write or generate a "blog post" type thing that explains what you're doing here and how from a holistic perspective? I'd love to put it on the project website in a featured way. |
|
Thanks very much, I'll be glad to see if you think this fits in with the package. Sure, I'd be willing to draft something and share. I'm embarrassed to learn that I hadn't seen the hyprwhspr web page before, always/only GH, even tho the link is plainly there. It's really good. By default I'd attempt to match the three excellent existing pieces on length and general tone. The angle that occurs is to pick up from your piece on how dictation is the future of programming. A greater and greater share of my input to the computer is hyprwhspr-recognized speech, and it already does the 80/20 things now -- try to print where the focus is, push onto the clipboard -- but if there was some mechanism to hook in your own code, it opens the door to integrate that dictated speech into your workflow and computing setup however you'd like. My imagination is probably limited on this front, but so far I've been using it to: (1) annotate the text s.t. the LLM can see it's been dictated and I can coach the LLM about words/phrases it consistently mishears, (2) log all the outputs for later study/review (although that also shows up in cliphist), and (3) this idiosyncratic application that notifies me when it hears a lot of (my particular) filler words. And you could imagine running some server, perhaps with some agentic loop listening, that catches your speech, and follows your directions -- one's own personal, tunable, and to your point -- as private as you care to make it -- Alexa-like. |
|
Dang, that is so cool. Thanks for sharing. Alright, we're definitely good to merge here. Absolutely no rush or pressure on any blog type thing. I'm happy to tune it to match the general tenor of the website. Though I do think what you're doing is very novel and interesting, and the way you're going about it, I think many others might benefit from it. ✌️ |
Adds a
post_transcription_hookconfig: a shell command every transcription is piped through after preprocessing, before paste.{ "post_transcription_hook": "~/.local/bin/my-hook" }or
{ "post_transcription_hook": "sed 's|.*|<dictation>&</dictation>|'" }HYPRWHSPR_MODELandHYPRWHSPR_BACKENDexported to the hookWhen unset, the hook method returns immediately after a config lookup — no subprocess, no string allocation, original
textreference handed back. Semantically identical to how it is now.Why
Most of my hyprwhspr usage is dictating text into LLMs, and I'd wanted to wrap the output in
<dictation>...</dictation>so the model knows it's ASR output and can be lenient about homophones and proper nouns. Then I can coach it inCLAUDE.mdabout how to interpret<dictation>tags. I do use a customwhisper_promptandword_overrides, and thank you for making those available, but I've found this nicely complements those mechanisms.Once the hook existed, two more uses fell out for free: archiving transcriptions to a log, and a filler-word script that pings
notify-sendwhen there are too many, like, filler words.Different shape from
record capture(#163) — that's a pull model for one-off wrappers; this is a push model running inline with injection and able to mutate the text.Alternatives considered
<dictation>wrapping case.transcription_llm_tagconfig — narrower, but the diff is about the same size as this, and this route seemed more usefulNotes
subprocess.runis already used 15× intext_injector.py; no new importsexcept Exception→print(...)→ fall through)shell=Trueis the one novel pattern in the file. Deliberate — pipes/chaining/~expansion are the point, and the command is fromconfig.json(same trust level as the rest of it). Commented at the call site.Docs entry mirrors
### Clipboard behavior.Thanks again for the work on this package.