-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Feature: Prefix decoding for wav2vec2 models #11606
Conversation
Problem faced currently: I created a custom kenlm and tried to run the code, but it stops without throwing any error at line
|
…into wav2vec2contribution
…into wav2vec2contribution
Performance: |
…into wav2vec2contribution
Wuhuhu! This is an amazing contribution @deepang17 - Super exciting to merge this notebook :-) And yes, it would be great if you could add a section to the README.md that explains how to use your script + maybe with some results (using Prefix decoding vs. not using it on e.g. Timit_asr and/or Librispeech evaluation - kinda like you already did above). I'm also very happy to help you run some evals! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Longterm we could even think about merging this into src/transformers/models/wav2vec2/
- but for now this is great!
Thank you for the appreciation. I will do the required changes to |
@deepang17 What is the status of this PR? |
You can fix it by replacing |
Can you please publish a Google Colab or a bash script to do the installation? I could't figure out where to do the change you suggested in the build, I'v used the Google Colab example from flashlight. |
with torch.no_grad(): | ||
logits = model(input_values).logits | ||
|
||
target_dictionary = [t for t in processor.tokenizer.get_vocab().keys()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing W2lViterbiDecoder i figure out that this list must by ordered by original token index.
target_dictionary = [t for t in processor.tokenizer.get_vocab().keys()] | |
vocab = processor.tokenizer.get_vocab() | |
target_dictionary = sorted(vocab.keys(), key=lambda k: vocab[k]) |
@deepang17 Thank you for your amazing work! Viterbi decoding works well, but KenLM decoding has the following error.
@deepang17 Do you know this error? It exactly gives as an argument of flashlight.lib.text.flashlight_lib_text_decoder.KenLM the dict obtained from flashlight.lib.text.dictionary.create_word_dict. |
@deepang17 - do you have updates regarding the README.md script? :-) I can take over the PR by next week otherwise! |
Hello @patrickvonplaten, Sorry for the delay. I was occupied due to some personal issues. I am on the verge of completing the README.md. I will commit the updated README soon. |
@deepang17 Any updates? |
@deepang17 Any updates? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This PR seems to be stuck since quite some time now. Is anyone interested in finishing / testing this PR? Might be better to start fresh otherwise with a blog post ) colab that explains how to make a complete ASR end-to-end system - cc @anton-l |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
@patrickvonplaten |
Hey @hbasafa, I'm now working on this topic full time. We will most likely foster a closer collaboration between pyctcdecode and Transformers. Here is a github repo that shows how to use |
Nice one! I will check it out. As I was in hurry, I've already used this code that could be easily installed via pip. However, Thank you for sharing! |
hi @patrickvonplaten - this is great news. Where is the best place to follow your progress? |
This PR: #14339 It all depends a bit on how fast we can merge a |
What does this PR do?
Added the code for prefix decoding for wav2vec2 based models.
Fixes #11283
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@patrickvonplaten @patil-suraj