-
Notifications
You must be signed in to change notification settings - Fork 60
Add timestamping to speech to text #742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
packages/react-native-executorch/common/rnexecutorch/host_objects/JsiConversions.h
Outdated
Show resolved
Hide resolved
...eact-native-executorch/common/rnexecutorch/models/speech_to_text/stream/OnlineASRProcessor.h
Outdated
Show resolved
Hide resolved
...ages/react-native-executorch/common/rnexecutorch/models/speech_to_text/types/ProcessResult.h
Outdated
Show resolved
Hide resolved
packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/SpeechToText.h
Outdated
Show resolved
Hide resolved
packages/react-native-executorch/src/hooks/natural_language_processing/useSpeechToText.ts
Outdated
Show resolved
Hide resolved
packages/react-native-executorch/src/hooks/natural_language_processing/useSpeechToText.ts
Outdated
Show resolved
Hide resolved
packages/react-native-executorch/src/hooks/natural_language_processing/useSpeechToText.ts
Outdated
Show resolved
Hide resolved
| waveform: Float32Array | number[], | ||
| options: DecodingOptions = {} | ||
| ): Promise<string> { | ||
| ): Promise<string | Word[]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about instead of a type union, returning a single type? I checked OpenAI docs and for word-level timestamping they're doing something like this:
{
"task": "transcribe",
"language": "english",
"duration": 8.470000267028809,
"text": "The beach was a popular spot on a hot summer day. People were swimming in the ocean, building sandcastles, and playing beach volleyball.",
"words": [
{
"word": "The",
"start": 0.0,
"end": 0.23999999463558197
},
...
{
"word": "volleyball",
"start": 7.400000095367432,
"end": 7.900000095367432
}
],
"usage": {
"type": "duration",
"seconds": 9
}
}
This is likely familiar for the user, if he ever used the OpenAI API, and the user doesnt have to merge words by themselves when using timestamps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so we want to always return plain transcription and additionally list of Words if needed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the second question is, Does OpenAI always return timestamps and full transcription, or is this optional as we have it right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's optional, so we only return if needed. It makes sense to me to match entirely the structure they're returning
Description
Introduces a breaking change?
Type of change
Tested on
Testing instructions
Run demo app in
apps/speechand run transcription for both timestamping and regular mode.Screenshots
Related issues
Checklist
Additional notes