Add timestamping to speech to text #742

msluszniak · 2026-01-21T14:58:17Z

Description

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS simulator
Android simulator
iOS device
Android device

Testing instructions

Run demo app in apps/speech and run transcription for both timestamping and regular mode.

Screenshots

Related issues

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

apps/speech/screens/SpeechToTextScreen.tsx

packages/react-native-executorch/common/rnexecutorch/host_objects/JsiConversions.h

...eact-native-executorch/common/rnexecutorch/models/speech_to_text/stream/OnlineASRProcessor.h

...ages/react-native-executorch/common/rnexecutorch/models/speech_to_text/types/ProcessResult.h

packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/SpeechToText.h

packages/react-native-executorch/src/hooks/natural_language_processing/useSpeechToText.ts

chmjkb · 2026-01-22T12:32:40Z

packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts

    waveform: Float32Array | number[],
    options: DecodingOptions = {}
-  ): Promise<string> {
+  ): Promise<string | Word[]> {


How about instead of a type union, returning a single type? I checked OpenAI docs and for word-level timestamping they're doing something like this:

{ "task": "transcribe", "language": "english", "duration": 8.470000267028809, "text": "The beach was a popular spot on a hot summer day. People were swimming in the ocean, building sandcastles, and playing beach volleyball.", "words": [ { "word": "The", "start": 0.0, "end": 0.23999999463558197 }, ... { "word": "volleyball", "start": 7.400000095367432, "end": 7.900000095367432 } ], "usage": { "type": "duration", "seconds": 9 } }

This is likely familiar for the user, if he ever used the OpenAI API, and the user doesnt have to merge words by themselves when using timestamps.

Ok, so we want to always return plain transcription and additionally list of Words if needed, right?

And the second question is, Does OpenAI always return timestamps and full transcription, or is this optional as we have it right now?

I think it's optional, so we only return if needed. It makes sense to me to match entirely the structure they're returning

msluszniak added 5 commits January 20, 2026 14:16

Draft of changes introducing timestamping

0ed6730

Add missing headers

b75204e

Add draft of working version for timestamps only

3c72c17

Working version of both timestamping and regular version

c0218bf

Clear files

9846acb

msluszniak self-assigned this Jan 21, 2026

msluszniak added the feature PRs that implement a new feature label Jan 21, 2026

msluszniak marked this pull request as draft January 21, 2026 14:58

msluszniak commented Jan 21, 2026

View reviewed changes

msluszniak and others added 2 commits January 21, 2026 16:36

Apply suggestions from code review

4e1ae51

Apply further clearing

b6bfdb7

msluszniak commented Jan 21, 2026

View reviewed changes

packages/react-native-executorch/src/hooks/natural_language_processing/useSpeechToText.ts Outdated Show resolved Hide resolved

msluszniak and others added 5 commits January 21, 2026 17:15

Apply suggestion from @msluszniak

dfea40e

Apply autofix lint changes

41839d0

Fix linter issues

2f916f8

Revert changing error messages

55f03be

Revert one more message

2fafd87

msluszniak linked an issue Jan 21, 2026 that may be closed by this pull request

Add speech to text timestamping #437

Open

Update docs

d394088

msluszniak marked this pull request as ready for review January 21, 2026 18:43

msluszniak requested review from IgorSwat, benITo47 and chmjkb January 21, 2026 18:43

Fix error in demo app

85fe485

chmjkb reviewed Jan 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timestamping to speech to text #742

Add timestamping to speech to text #742

msluszniak commented Jan 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chmjkb Jan 22, 2026

Uh oh!

msluszniak Jan 22, 2026

Uh oh!

msluszniak Jan 22, 2026

Uh oh!

chmjkb Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add timestamping to speech to text #742

Are you sure you want to change the base?

Add timestamping to speech to text #742

Conversation

msluszniak commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chmjkb Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

chmjkb Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

msluszniak commented Jan 21, 2026 •

edited

Loading