Skip to content

feat: add post_transcription_hook for user-defined text transforms#167

Merged
goodroot merged 2 commits intogoodroot:mainfrom
mmacpherson:local/post-transcription-hook
Apr 25, 2026
Merged

feat: add post_transcription_hook for user-defined text transforms#167
goodroot merged 2 commits intogoodroot:mainfrom
mmacpherson:local/post-transcription-hook

Conversation

@mmacpherson
Copy link
Copy Markdown
Contributor

Adds a post_transcription_hook config: a shell command every transcription is piped through after preprocessing, before paste.

{ "post_transcription_hook": "~/.local/bin/my-hook" }

or

{ "post_transcription_hook": "sed 's|.*|<dictation>&</dictation>|'" }
  • stdin = the transcription
  • non-empty stdout replaces it; empty stdout leaves it unchanged (observer-only hooks)
  • HYPRWHSPR_MODEL and HYPRWHSPR_BACKEND exported to the hook
  • 5 s timeout; on timeout, non-zero exit, or any subprocess error the original text is preserved

When unset, the hook method returns immediately after a config lookup — no subprocess, no string allocation, original text reference handed back. Semantically identical to how it is now.

Why

Most of my hyprwhspr usage is dictating text into LLMs, and I'd wanted to wrap the output in <dictation>...</dictation> so the model knows it's ASR output and can be lenient about homophones and proper nouns. Then I can coach it in CLAUDE.md about how to interpret <dictation> tags. I do use a custom whisper_prompt and word_overrides, and thank you for making those available, but I've found this nicely complements those mechanisms.

Once the hook existed, two more uses fell out for free: archiving transcriptions to a log, and a filler-word script that pings notify-send when there are too many, like, filler words.

Different shape from record capture (#163) — that's a pull model for one-off wrappers; this is a push model running inline with injection and able to mutate the text.

Alternatives considered

  • systemd path units — need an observable signal (hyprwhspr writes to a socket, not a stable file per feat: capture the current transcription #163), and would fire after injection so they can observe but not transform. Rules out the <dictation> wrapping case.
  • Dedicated transcription_llm_tag config — narrower, but the diff is about the same size as this, and this route seemed more useful
  • List of hooks — considered, but a single entry point keeps the config surface minimal; a user who wants to do N things can write one script that does N things.

Notes

  • subprocess.run is already used 15× in text_injector.py; no new imports
  • error handling matches the existing ydotool/wtype/hyprctl wrappers (except Exceptionprint(...) → fall through)
  • shell=True is the one novel pattern in the file. Deliberate — pipes/chaining/~ expansion are the point, and the command is from config.json (same trust level as the rest of it). Commented at the call site.

Docs entry mirrors ### Clipboard behavior.

Thanks again for the work on this package.

Introduce `post_transcription_hook` config option. When set to a shell
command, every transcription is piped through it after preprocessing
and before paste. Stdin receives the transcription; non-empty stdout
replaces it. Empty stdout leaves text unchanged, enabling fire-and-
forget observers.

Subsumes the narrower dictation-tag use case while opening the same
hook up to arbitrary transforms (tag wrapping, logging, profanity
filters, notifications, per-utterance analytics) without further
config surface or upstream patches.

Examples (user config):
  # Wrap in <dictation> tag for LLM consumers:
  "post_transcription_hook": "sed 's|.*|<dictation>&</dictation>|'"

  # Archive to file, leave text unchanged:
  "post_transcription_hook": "tee -a ~/.local/share/hyprwhspr/log.txt >/dev/null"

  # Count filler words, notify, leave text unchanged:
  "post_transcription_hook": "~/.local/bin/filler-coach"

Metadata exposed via env vars HYPRWHSPR_MODEL and HYPRWHSPR_BACKEND.
Hook runs under a 5 s timeout; on timeout, non-zero exit, or any
subprocess error the original text is preserved — a broken hook must
never silently eat a dictation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mmacpherson mmacpherson requested a review from goodroot as a code owner April 24, 2026 23:04
@goodroot
Copy link
Copy Markdown
Owner

Hey, thanks. This is so cool.

I will review this shortly.

Absolutely no obligation, but...

Would you be willing to write or generate a "blog post" type thing that explains what you're doing here and how from a holistic perspective?

I'd love to put it on the project website in a featured way.

@mmacpherson
Copy link
Copy Markdown
Contributor Author

Thanks very much, I'll be glad to see if you think this fits in with the package.

Sure, I'd be willing to draft something and share. I'm embarrassed to learn that I hadn't seen the hyprwhspr web page before, always/only GH, even tho the link is plainly there. It's really good. By default I'd attempt to match the three excellent existing pieces on length and general tone.

The angle that occurs is to pick up from your piece on how dictation is the future of programming. A greater and greater share of my input to the computer is hyprwhspr-recognized speech, and it already does the 80/20 things now -- try to print where the focus is, push onto the clipboard -- but if there was some mechanism to hook in your own code, it opens the door to integrate that dictated speech into your workflow and computing setup however you'd like. My imagination is probably limited on this front, but so far I've been using it to: (1) annotate the text s.t. the LLM can see it's been dictated and I can coach the LLM about words/phrases it consistently mishears, (2) log all the outputs for later study/review (although that also shows up in cliphist), and (3) this idiosyncratic application that notifies me when it hears a lot of (my particular) filler words. And you could imagine running some server, perhaps with some agentic loop listening, that catches your speech, and follows your directions -- one's own personal, tunable, and to your point -- as private as you care to make it -- Alexa-like.

@goodroot
Copy link
Copy Markdown
Owner

goodroot commented Apr 25, 2026

Dang, that is so cool. Thanks for sharing.

Alright, we're definitely good to merge here. Absolutely no rush or pressure on any blog type thing. I'm happy to tune it to match the general tenor of the website. Though I do think what you're doing is very novel and interesting, and the way you're going about it, I think many others might benefit from it. ✌️

@goodroot goodroot merged commit c56be99 into goodroot:main Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants