feat!: major performance & accuracy improvements in speech-to-text module by IgorSwat · Pull Request #1132 · software-mansion/react-native-executorch

IgorSwat · 2026-05-08T08:26:37Z

Description

This PR introduces several changes to the speech-to-text module based on Whisper models:

CoreML integration - models re-exported to CoreML backend, bringing significant performance upgrade for iOS devices.
New streaming algorithm - eliminates duplicates in streaming output, resulting in a major quality improvement of the live streaming mode.
Changes in demo apps: removed faulty 'voice mode' screen in LLM demo app, refactored speech to text screen in 'speech' app by adding new CoreML models to selection bar and changing the default model for iOS devices.
Minor code improvements in speech-to-text module

Introduces a breaking change?

Yes
No

Change: removes predefined constants for quantized models.
Justification: the quantized models differ very slightly from the original ones, introducing unnecessary complexity in this case.

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

Run demo app to test the live streaming mode.

Screenshots

Related issues

#1124

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

msluszniak · 2026-05-08T15:30:33Z

Also if this PR adds breaking change, please describe it directly below Introduces a breaking change? section in PR body.

msluszniak · 2026-05-19T15:06:19Z

Side note, after merging PR with TTS and rebasing, please make sure that native tests works here after all changes.

chmjkb

approved by an accident

chmjkb

tested the demo app and works like a charm for iOS, thank u!

chmjkb · 2026-05-20T12:39:45Z

+const WHISPER_TINY_EN_TOKENIZER = `${URL_PREFIX}-whisper-tiny.en/${VERSION_TAG}/tokenizer.json`;
+const WHISPER_TINY_EN_MODEL_XNNPACK = `${URL_PREFIX}-whisper-tiny.en/${VERSION_TAG}/xnnpack/whisper_tiny_en_xnnpack_fp32.pte`;
+const WHISPER_TINY_EN_MODEL_COREML = `${URL_PREFIX}-whisper-tiny.en/${VERSION_TAG}/coreml/whisper_tiny_en_coreml_fp32.pte`;


We used to handle the backend selection automatically, as done for the style transfer, not a big problem as this is likely going to be re-written in the mogel registry PR cc @msluszniak

Yeah, I will handle it in my PR.

msluszniak · 2026-05-20T13:38:00Z

Native tests fail to configure on this rebased branch — tests/CMakeLists.txt:265 still references models/speech_to_text/whisper/HypothesisBuffer.cpp, which this PR deletes:

CMake Error at CMakeLists.txt:137 (add_executable):
  Cannot find source file:

    .../models/speech_to_text/whisper/HypothesisBuffer.cpp

Call Stack (most recent call first):
  CMakeLists.txt:261 (add_rn_test)

Please drop the HypothesisBuffer.cpp source line (and any HypothesisBuffer.h includes in the speech-to-text test) so bash run_tests.sh builds again.

msluszniak

Tested green on Android (demo app + native tests). A few suggestions inline, plus a few that touch lines unchanged by this PR — listed here since the lines aren't part of the diff and can't be commented inline:

src/types/stt.ts:10,12,14 — SpeechToTextModelName union still lists 'whisper-tiny-en-quantized' | 'whisper-base-en-quantized' | 'whisper-small-en-quantized'. The quantized constants are deleted in this PR, so these literals now type-check but cannot be constructed from any built-in. Worth dropping in the same breaking-change.
common/rnexecutorch/models/speech_to_text/whisper/ASR.cpp:217 (preexisting) — divisor tokens.size() + 1 matches neither a literal mean (scores.size()) nor OpenAI Whisper's formula (len(full_seq) + 1, where full_seq includes SOT prefix and EOT). Worth picking one explicitly. For reference, whisper.cpp uses sum_logprobs / result_len (no +1) — src/whisper.cpp:6602-6603.
common/rnexecutorch/models/speech_to_text/whisper/ASR.cpp:308 (preexisting) — std::mt19937 gen((std::random_device{}())) lives inside the autoregressive sampling loop, so random_device is consulted and a fresh Mersenne state is constructed for every sampled token. Hoist to a member (or static thread_local) seeded once per generate().
common/rnexecutorch/models/speech_to_text/SpeechToText.h:38 (preexisting) — transcribeStringOnly is declared but never defined or referenced anywhere in the package; dead API surface, safe to drop.

Non-blocking — feel free to fold what you want into this PR or a follow-up.

The method was declared in SpeechToText.h but never defined or referenced anywhere in the package. Removing it cleans up the public API surface.

insertAudioChunk's overflow path was overwriting memory_.toCommit on each cap-hit. Two cap-hits before the next process() call silently dropped the first batch. Append instead of assign.

The previous tokens.size() + 1 matched neither a literal mean (would be scores.size()) nor OpenAI Whisper's formula (len(full_seq) + 1, where full_seq includes the SOT prefix and EOT). Align with whisper.cpp, which divides by the number of summed log-probs.

random_device was consulted and a fresh Mersenne state constructed for every sampled token. Seed once per generate() call instead.

The whisper-*-en-quantized constants are removed in this PR, but the SpeechToTextModelName union still accepted those literals — type-safe to pass, runtime-failing to use. Drop them from the union as part of the same breaking-change.

The header had bool enableTimestamps; the .cpp uses bool verbose (which matches the JS-side DecodingOptions.verbose). Rename here for consistency.

The streaming loop slept sleep_for(timeout) ms unconditionally between inferences, so streamStop() couldn't take effect until the next pause expired (final flush delayed by the full timeout). Replace with a condition_variable wait that streamStop() signals; inserts intentionally do not wake the loop, preserving the throttle.

msluszniak

Looks good, added some minor improvements and tested all on android. If you want, you can retest on demo app on iOS, up to you. Overall great job on this one :))

chmjkb · 2026-05-20T17:49:09Z

I tested demos on iOS before those changes and it worked good, guess ill retest tomorrow

chmjkb

tested demo apps on iOS, works fine

…ware-mansion/react-native-executorch into @is/speech-to-text-ultimate

IgorSwat requested review from benITo47, chmjkb and msluszniak May 8, 2026 08:26

IgorSwat added model Issues related to exporting, improving, fixing ML models improvement PRs or issues focused on improvements in the current codebase labels May 8, 2026

msluszniak assigned IgorSwat May 8, 2026

msluszniak requested changes May 8, 2026

View reviewed changes

IgorSwat changed the title ~~feat: major performance & accuracy improvements in speech-to-text module~~ feat!: major performance & accuracy improvements in speech-to-text module May 8, 2026

msluszniak reviewed May 11, 2026

View reviewed changes

IgorSwat force-pushed the @is/speech-to-text-ultimate branch from c5d3c14 to a91344c Compare May 19, 2026 11:17

chmjkb approved these changes May 20, 2026

View reviewed changes

chmjkb requested changes May 20, 2026

View reviewed changes

msluszniak reviewed May 20, 2026

View reviewed changes

Comment thread packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/ASR.cpp Outdated

msluszniak reviewed May 20, 2026

View reviewed changes

Comment thread ...ages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/OnlineASR.cpp Outdated

IgorSwat force-pushed the @is/speech-to-text-ultimate branch from 6191212 to 02113ff Compare May 20, 2026 12:30

chmjkb approved these changes May 20, 2026

View reviewed changes

IgorSwat added 11 commits May 20, 2026 17:45

Optimal streaming algorithm

bc31cd5

Revert back to 100ms refresh rate

92b3f29

Add CoreML whisper models

35290db

Update model urls

7473f01

Change default model for iOS devices

9b90ea3

Add explicit timeout parameter

9af8124

Concurrency fixes & automatic cleaunp

f7849fc

Update urls & audio-api

3bf68bf

Apply review suggestions

27769d4

Rebase with main

c5b142d

Minor fixes

6bba141

IgorSwat force-pushed the @is/speech-to-text-ultimate branch from 02113ff to 6bba141 Compare May 20, 2026 15:46

Fix broken test build

1aebae6

msluszniak reviewed May 20, 2026

View reviewed changes

msluszniak mentioned this pull request May 20, 2026

feat(constants)!: switch URLs to v0.9.0 layout + add MODEL_REGISTRY #1148

Open

10 tasks

Mateusz Słuszniak added 7 commits May 20, 2026 19:22

chore(stt): drop unused transcribeStringOnly declaration

88185d5

The method was declared in SpeechToText.h but never defined or referenced anywhere in the package. Removing it cleans up the public API surface.

fix(stt): preserve pending committed words in OnlineASR

44f6931

insertAudioChunk's overflow path was overwriting memory_.toCommit on each cap-hit. Two cap-hits before the next process() call silently dropped the first batch. Append instead of assign.

perf(stt): hoist mt19937 out of the sampling loop

7f46540

random_device was consulted and a fresh Mersenne state constructed for every sampled token. Seed once per generate() call instead.

chore(stt): align stream() declaration with definition

0605950

The header had bool enableTimestamps; the .cpp uses bool verbose (which matches the JS-side DecodingOptions.verbose). Rename here for consistency.

msluszniak approved these changes May 20, 2026

View reviewed changes

chmjkb approved these changes May 21, 2026

View reviewed changes

IgorSwat added 2 commits May 21, 2026 09:59

docs: simplify & update STT docs

ef92351

Merge branch '@is/speech-to-text-ultimate' of https://github.com/soft…

d1321b3

…ware-mansion/react-native-executorch into @is/speech-to-text-ultimate

IgorSwat merged commit d3182ce into main May 21, 2026
5 checks passed

IgorSwat deleted the @is/speech-to-text-ultimate branch May 21, 2026 08:20

Conversation

IgorSwat commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msluszniak commented May 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msluszniak commented May 19, 2026

Uh oh!

chmjkb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chmjkb left a comment

Choose a reason for hiding this comment

Uh oh!

chmjkb May 20, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak May 20, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak commented May 20, 2026

Uh oh!

msluszniak left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msluszniak left a comment

Choose a reason for hiding this comment

Uh oh!

chmjkb commented May 20, 2026

Uh oh!

chmjkb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IgorSwat commented May 8, 2026 •

edited

Loading

msluszniak left a comment •

edited

Loading