Getting word level timestamps from the speechbrain ASR model #1945

vedvasu · 2023-04-18T10:21:32Z

vedvasu
Apr 18, 2023

Hey folks,

I have been using the speech-brain transformer recipe for running predictions on Speech to Text model trained on the LibriSpeech dataset. Is there a way to get word-level timestamps/alignments along with the transcripts?

Ref: https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech/ASR/transformer

Currently, the transcripts are returned as a list of predicted words.

Adel-Moumen · 2023-04-20T08:07:32Z

Adel-Moumen
Apr 20, 2023
Maintainer

Hi @vedvasu,

You might be interested in PyCTCDecode see: https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/ASR/CTC/train_with_wav2vec.py#L108 and https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/ASR/CTC/hparams/downsampled/train_hf_wavlm_signal_downsampling.yaml

It provides the frame of each tokens. Unfortunately, it does only works for CTC models.

0 replies

ilaouirine · 2023-05-12T10:09:11Z

ilaouirine
May 12, 2023

2 replies

Adel-Moumen May 15, 2023
Maintainer

Hello, please take a look at my last comment here : #1945 (comment)

ilaouirine May 15, 2023

Sorry, I mistakenly deleted my comment. Thanks for your response!

ilaouirine · 2023-05-22T10:52:36Z

ilaouirine
May 22, 2023

Hello, I'm using the provided code and PyCTCDecode, but I'm getting only words, how can I get the timestamps during the test?
Thank you!

2 replies

Adel-Moumen May 24, 2023
Maintainer

Yes, as you can see in PyCTCDecode https://github.com/kensho-technologies/pyctcdecode/blob/main/pyctcdecode/decoder.py#LL730C9-L730C21 you should use this function to get more details about your beams.

So instead of calling decode please call decode_beams here https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/ASR/CTC/train_with_wav2vec.py#L111 which will return this https://github.com/kensho-technologies/pyctcdecode/blob/main/pyctcdecode/decoder.py#L766

ilaouirine May 24, 2023

Ok, I'll try this. Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting word level timestamps from the speechbrain ASR model #1945

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Getting word level timestamps from the speechbrain ASR model #1945

vedvasu Apr 18, 2023

Replies: 3 comments · 4 replies

Adel-Moumen Apr 20, 2023 Maintainer

ilaouirine May 12, 2023

Adel-Moumen May 15, 2023 Maintainer

ilaouirine May 15, 2023

ilaouirine May 22, 2023

Adel-Moumen May 24, 2023 Maintainer

ilaouirine May 24, 2023

vedvasu
Apr 18, 2023

Replies: 3 comments 4 replies

Adel-Moumen
Apr 20, 2023
Maintainer

ilaouirine
May 12, 2023

Adel-Moumen May 15, 2023
Maintainer

ilaouirine
May 22, 2023

Adel-Moumen May 24, 2023
Maintainer