Replies: 3 comments 4 replies
-
Hi @vedvasu, You might be interested in PyCTCDecode see: https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/ASR/CTC/train_with_wav2vec.py#L108 and https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/ASR/CTC/hparams/downsampled/train_hf_wavlm_signal_downsampling.yaml It provides the frame of each tokens. Unfortunately, it does only works for CTC models. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hello, I'm using the provided code and PyCTCDecode, but I'm getting only words, how can I get the timestamps during the test? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey folks,
I have been using the speech-brain transformer recipe for running predictions on Speech to Text model trained on the LibriSpeech dataset. Is there a way to get word-level timestamps/alignments along with the transcripts?
Ref: https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech/ASR/transformer
Currently, the transcripts are returned as a list of predicted words.
Beta Was this translation helpful? Give feedback.
All reactions