Replies: 6 comments 8 replies
-
could you describe which model you are using? |
Beta Was this translation helpful? Give feedback.
-
Could you try https://huggingface.co/spaces/k2-fsa/generate-subtitles-for-videos |
Beta Was this translation helpful? Give feedback.
-
It looks like the model above has timestamp info with it [csukuangfj/sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04 ], but the file here: https://github.com/k2-fsa/sherpa-onnx/blob/1a43d1e37f2a65a7326e75be4607b4996f9737a8/sherpa-onnx/python/sherpa_onnx/offline_recognizer.py
Q1: Is there a recipe in icefall project that I can use to produce a model that can output timestamps info ? For example the openAI endpoint lets you adjust granularity on the word level: https://platform.openai.com/docs/guides/speech-to-text/timestamps. My ultimate goal is to have an endpoint that would let user to change granularity of timestamps. Q2: For the last few months I was training zipformer models. Is there a way to activate a flag or change the settings in the code so trained zipformer can return timestamps? |
Beta Was this translation helpful? Give feedback.
-
All models from icefall support timestamps. Just follow to use
and then you will get the timestamps for each token. Hint: (s.result.tokens can output a list of tokens) Please use
to see what available fields are in |
Beta Was this translation helpful? Give feedback.
-
Do you want timestamps at the token level, word level, or sentence level? |
Beta Was this translation helpful? Give feedback.
-
Yeah, I think this is what Nadira wants (as she has an idea to highlight word by word, sentence/chunk by sentence). I noticed that with the example of subtitles above, it's super easy for Xin to get the chunk timestamps. I wonder if our model output word level timestamps directly so they can just use that. |
Beta Was this translation helpful? Give feedback.
-
Hi,
Is there a method to produce timestamps in sherpa-onnx for offline model ?
I am currently running offline version sherpa-onnx and my model is producing decoded text. I am sending a wav file and getting the transcript for the audio.
I also want to output the timestamps per word or per sentence.
Here is the whisper output when I tested it:
[00:00.000 --> 00:12.280] Okay. All right. Well, good evening, everybody. Welcome. Elon, thanks for being here.
[00:12.280 --> 00:13.920] Thank you for having me.
I want to be able to do something similar.
Thanks
Beta Was this translation helpful? Give feedback.
All reactions