nerd-dictation like output #382

microeconatusyd · 2026-04-05T07:20:51Z

microeconatusyd
Apr 5, 2026

Hi,
Thanks for the program. I tried it, the default settings, on Linux. It works, but I was looking for something like nerd-dictaton where (I think using xdotool) keystrokes for anything spoken would be inserted on the fly! Here, the insertion appears to work only when I stop the recording. Wonder if your program can achieve that feature. The only reason I was looking to move from nerd-dictation is that its backend is not whisper (but vosk).

I am happy to try any experimetal versions if you choose to include this feature.

Thanks and kind regards
Murali
murali.agastya@gmail.com

jatinkrmalik · 2026-04-06T01:15:11Z

jatinkrmalik
Apr 6, 2026
Maintainer

Hi @murali-agastya, thanks for the feedback — this is actually something we are actively working on right now!

We are building real-time streaming transcription as an experimental feature. It will show text as you speak (similar to nerd-dictation's live output) rather than waiting until you stop recording.

However, this requires significant architectural changes to the recognition pipeline:

Streaming engine integration: The current pipeline is designed for batch recognition (record → process → inject). Streaming requires a fundamentally different approach where audio is processed in overlapping chunks and partial results are emitted continuously.
Deduplication buffer: When processing overlapping audio segments, you inevitably get duplicate text. We are implementing a Local Agreement (LA-2) transcript buffer that confirms words only when they appear consistently across consecutive recognition passes — this prevents flickering and duplicate text injection.
Engine-specific handling: Vosk has a native PartialResult() API for streaming, while Whisper requires a sliding-window approach with the dedup buffer. Both paths need separate implementations.
Thread safety: The current text injection pipeline assumes one-shot delivery. Streaming means partial results arrive on a background thread while the UI needs to stay responsive.

All of this is behind a setting toggle (experimental_streaming) and off by default. The feature will ship as experimental while we validate accuracy and stability across both Vosk and Whisper engines.

We have an open PR in progress — once it is ready for testing, we will circle back here so you can try it out. Appreciate your patience!

0 replies

0xHertz · 2026-06-03T01:30:31Z

0xHertz
Jun 3, 2026

I also want to experience and test this feature. please let me know if it 's ready.Thanks for your work. mr.kechen@outlook.com

0 replies

cmoney113 · 2026-06-11T07:22:17Z

cmoney113
Jun 11, 2026

Why do developers use a four year-old, piece of shit asr? Whisper is, quite literally, the worst asr available, yet it continues to show up in apps like this one. It blows my mind. I saw you recommend 15 gigs of vram for whisper medium--15 gigs!! sensevoie small is 100-times the asr whisper is and it is ~200mb and uses 300mb vram. Absolute misguided lunacy. And your streaming implementatino wouldn't need insane chunking hacks (see: simulstream) just to get something resembling realtime working. You could just use, oh i don't know, one of the 15 actually MODERN ASRs? What the hell are you doing? seriously. Use Nemotron-realtime. qwen3-asr. There are so many possibilities, and you choose a four year-old piece of shit relic. Awesome.

I guess I'll just dominate this space when I publish mine, since all "devs" seem to think is out there is Whisper. Take a gander at hf from time to time--it'll do you some good.

5 replies

Felipe-53 Jun 11, 2026
Sponsor

Your comment seems to have valuable insight, but you chose to do so in a rude and aggressive way. Too bad.

If you don't like the project, don't use it; simple. The internet has too much rage and hate already. This is one guy making an open-source project, and he does not need to satisfy what you think should be the way to do things. As I said in the beginning, perhaps you have valuable insight, but the way you did it was terrible.

If you're not happy, you either 1) comment in a respectful way or 2) don't bother at all: go build your own or whatever.

Wish you well.

jatinkrmalik Jun 11, 2026
Maintainer

Hi @cmoney113, thanks for sharing the technical suggestions. I can tell you have strong opinions here, and I do appreciate the pointers to newer ASR options. That kind of insight is useful, and I’ll look into the models and approaches you mentioned.

That said, the tone of your comment was unnecessarily rude.

Vocalinux is an open-source project that I’m building in my free time. I don’t claim to have deep experience with every ASR model or every possible architecture. I’m learning, discovering things, and implementing improvements as the project evolves. That is also the whole ethos of open source: people come in, share ideas, contribute code, test things, and help make the project better.

I welcome strong technical feedback, but please share it respectfully. Insults and hostility don’t make the feedback more useful, they just make collaboration harder.

If you have concrete recommendations, benchmarks, integration notes, or even a PR, they would be welcome. I hope future feedback can focus on improving the project without the anger. These are ultimately small technical disagreements on the internet, and none of us benefits from turning them into personal attacks.

Thanks again for the useful parts of your comment.

ashwinbhy Jun 12, 2026

Imagine typing out a whole essay crying about VRAM just to announce you don't understand how open-source development works. If you spent half as much time building your supposedly "dominant" app as you do whining in comments, you might actually have a repository worth looking at. Go build it and leave the adults to work.

Tanaykmr Jun 12, 2026

@cmoney113 as Gandhi said: "Be the change you want to see in the world".
Maybe raise a PR now, rather than crashing out?

cmoney113 Jun 12, 2026

Hi @cmoney113, thanks for sharing the technical suggestions. I can tell you have strong opinions here, and I do appreciate the pointers to newer ASR options. That kind of insight is useful, and I’ll look into the models and approaches you mentioned.

That said, the tone of your comment was unnecessarily rude.

Vocalinux is an open-source project that I’m building in my free time. I don’t claim to have deep experience with every ASR model or every possible architecture. I’m learning, discovering things, and implementing improvements as the project evolves. That is also the whole ethos of open source: people come in, share ideas, contribute code, test things, and help make the project better.

I welcome strong technical feedback, but please share it respectfully. Insults and hostility don’t make the feedback more useful, they just make collaboration harder.

If you have concrete recommendations, benchmarks, integration notes, or even a PR, they would be welcome. I hope future feedback can focus on improving the project without the anger. These are ultimately small technical disagreements on the internet, and none of us benefits from turning them into personal attacks.

Thanks again for the useful parts of your comment.

Ya know, you have a point. I have deep technical experience with asr/tts models. I fully reverse-engineered kokoro and got it speaking 72 languages that Wired is doing an expose about. I apologize for my unnecessary rudemess. Really. I suppose I had a bad day.

It's just, picture you're like a master painter andyou look around and everyone is freely painting houses public structures with lead paint. That's what this Whisper thing feels like to me.

I really can help you solve these issues, and quite easily. I am intimately familiar with all current models, backends, which are realtime, which are not, how to solve the focus problem, and so forth. Let me know if you would like to chat.

Uh oh!

nerd-dictation like output #382

Uh oh!

microeconatusyd Apr 5, 2026

Replies: 3 comments · 5 replies

Uh oh!

jatinkrmalik Apr 6, 2026 Maintainer

Uh oh!

0xHertz Jun 3, 2026

Uh oh!

Uh oh!

cmoney113 Jun 11, 2026

Uh oh!

Uh oh!

Felipe-53 Jun 11, 2026 Sponsor

Uh oh!

Uh oh!

jatinkrmalik Jun 11, 2026 Maintainer

Uh oh!

ashwinbhy Jun 12, 2026

Uh oh!

Tanaykmr Jun 12, 2026

Uh oh!

cmoney113 Jun 12, 2026

microeconatusyd
Apr 5, 2026

Replies: 3 comments 5 replies

jatinkrmalik
Apr 6, 2026
Maintainer

0xHertz
Jun 3, 2026

cmoney113
Jun 11, 2026

Felipe-53 Jun 11, 2026
Sponsor

jatinkrmalik Jun 11, 2026
Maintainer