Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Feature: Prefix decoding for wav2vec2 models #11606

Closed
wants to merge 5 commits into from
Closed

Added Feature: Prefix decoding for wav2vec2 models #11606

wants to merge 5 commits into from

Conversation

deepang17
Copy link

@deepang17 deepang17 commented May 6, 2021

What does this PR do?

Added the code for prefix decoding for wav2vec2 based models.

Fixes #11283

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@patrickvonplaten @patil-suraj

@deepang17
Copy link
Author

deepang17 commented May 6, 2021

  • Currently the code supports prefix decoding without LM. I am still working to integrate the kenlm version.

Problem faced currently: I created a custom kenlm and tried to run the code, but it stops without throwing any error at line results = self.decoder.decode(emissions_ptr, T, N) I am currently trying to fix it. (RESOLVED)

  • Shall I create a .sh or .txt file to guide on how to install flashlight dependencies?

@deepang17 deepang17 changed the title ADDED FEATURE: Prefix decoding for wav2vec2 models [WIP] ADDED FEATURE: Prefix decoding for wav2vec2 models May 6, 2021
@deepang17
Copy link
Author

Performance:
Model:- facebook/wav2vec2-base-960h
Dataset:- timit_asr, clean, test[:5%]
Viterbi Decoding:- wer: 0.115
KenLM Decoding:- wer: 0.098

@deepang17 deepang17 changed the title [WIP] ADDED FEATURE: Prefix decoding for wav2vec2 models ADDED FEATURE: Prefix decoding for wav2vec2 models May 7, 2021
@patrickvonplaten
Copy link
Contributor

patrickvonplaten commented May 13, 2021

Wuhuhu! This is an amazing contribution @deepang17 - Super exciting to merge this notebook :-) And yes, it would be great if you could add a section to the README.md that explains how to use your script + maybe with some results (using Prefix decoding vs. not using it on e.g. Timit_asr and/or Librispeech evaluation - kinda like you already did above). I'm also very happy to help you run some evals!

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longterm we could even think about merging this into src/transformers/models/wav2vec2/ - but for now this is great!

@patrickvonplaten patrickvonplaten changed the title ADDED FEATURE: Prefix decoding for wav2vec2 models Added Feature: Prefix decoding for wav2vec2 models May 13, 2021
@deepang17
Copy link
Author

deepang17 commented May 13, 2021

Thank you for the appreciation. I will do the required changes to README.md and push a commit soon.

@samuelazran
Copy link

samuelazran commented May 18, 2021

  • Currently the code supports prefix decoding without LM. I am still working to integrate the kenlm version.

Problem faced currently: I created a custom kenlm and tried to run the code, but it stops without throwing any error at line results = self.decoder.decode(emissions_ptr, T, N) I am currently trying to fix it. (RESOLVED)

  • Shall I create a .sh or .txt file to guide on how to install flashlight dependencies?

@deepang17
Did you pushed that fix? I've tried your code and it is crushing at the "self.decoder.decode". What was your fix?

What is the status of this PR?

@deepang17
Copy link
Author

  • Currently the code supports prefix decoding without LM. I am still working to integrate the kenlm version.

Problem faced currently: I created a custom kenlm and tried to run the code, but it stops without throwing any error at line results = self.decoder.decode(emissions_ptr, T, N) I am currently trying to fix it. (RESOLVED)

  • Shall I create a .sh or .txt file to guide on how to install flashlight dependencies?

@deepang17
Did you pushed that fix? I've tried your code and it is crushing at the "self.decoder.decode". What was your fix?

What is the status of this PR?

You can fix it by replacing !cmake .. -DCMAKE_BUILD_TYPE=Release -DKENLM_MAX_ORDER=20 -DCMAKE_POSITION_INDEPENDENT_CODE=ON to !cmake ..

@samuelazran
Copy link

  • Currently the code supports prefix decoding without LM. I am still working to integrate the kenlm version.

Problem faced currently: I created a custom kenlm and tried to run the code, but it stops without throwing any error at line results = self.decoder.decode(emissions_ptr, T, N) I am currently trying to fix it. (RESOLVED)

  • Shall I create a .sh or .txt file to guide on how to install flashlight dependencies?

@deepang17
Did you pushed that fix? I've tried your code and it is crushing at the "self.decoder.decode". What was your fix?
What is the status of this PR?

You can fix it by replacing !cmake .. -DCMAKE_BUILD_TYPE=Release -DKENLM_MAX_ORDER=20 -DCMAKE_POSITION_INDEPENDENT_CODE=ON to !cmake ..

Can you please publish a Google Colab or a bash script to do the installation? I could't figure out where to do the change you suggested in the build, I'v used the Google Colab example from flashlight.

with torch.no_grad():
logits = model(input_values).logits

target_dictionary = [t for t in processor.tokenizer.get_vocab().keys()]
Copy link

@joaoalvarenga joaoalvarenga Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing W2lViterbiDecoder i figure out that this list must by ordered by original token index.

Suggested change
target_dictionary = [t for t in processor.tokenizer.get_vocab().keys()]
vocab = processor.tokenizer.get_vocab()
target_dictionary = sorted(vocab.keys(), key=lambda k: vocab[k])

@tommy19970714
Copy link

tommy19970714 commented Jun 7, 2021

@deepang17 Thank you for your amazing work!
I made Google Colab to reproduce this pull request. @samuelazran You can check this.
https://colab.research.google.com/drive/1HHEBS3I4biQ8ZDyfJDtHi4E4onOtYe46?usp=sharing

Viterbi decoding works well, but KenLM decoding has the following error.

  File "run_wav2vec2_eval_with_lm.py", line 292, in <module>
    main()
  File "run_wav2vec2_eval_with_lm.py", line 281, in main
    results = selected_dataset.map(map_to_result)
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1606, in map
    desc=desc,
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 176, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datasets/fingerprint.py", line 397, in wrapper
    out = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1911, in _map_single
    example = apply_function_on_filtered_inputs(example, i, offset=offset)
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1826, in apply_function_on_filtered_inputs
    function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
  File "run_wav2vec2_eval_with_lm.py", line 265, in map_to_result
    decoder = W2lKenLMDecoder(eval_args, target_dictionary)
  File "run_wav2vec2_eval_with_lm.py", line 201, in __init__
    self.lm = KenLM(args.kenlm_model, self.word_dict)
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. flashlight.lib.text.flashlight_lib_text_decoder.KenLM(path: str, usr_token_dict: fl::lib::text::Dictionary)

Invoked with: None, <flashlight.lib.text.flashlight_lib_text_dictionary.Dictionary object at 0x7fe0ef7294b0>

@deepang17 Do you know this error? It exactly gives as an argument of flashlight.lib.text.flashlight_lib_text_decoder.KenLM the dict obtained from flashlight.lib.text.dictionary.create_word_dict.

@patrickvonplaten
Copy link
Contributor

@deepang17 - do you have updates regarding the README.md script? :-) I can take over the PR by next week otherwise!

@deepang17
Copy link
Author

Hello @patrickvonplaten, Sorry for the delay. I was occupied due to some personal issues. I am on the verge of completing the README.md. I will commit the updated README soon.

@tommy19970714
Copy link

@deepang17 Any updates?

@shiva1393
Copy link

@deepang17 Any updates?

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Aug 1, 2021
@patrickvonplaten
Copy link
Contributor

This PR seems to be stuck since quite some time now. Is anyone interested in finishing / testing this PR?

Might be better to start fresh otherwise with a blog post ) colab that explains how to make a complete ASR end-to-end system - cc @anton-l

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Sep 8, 2021
@hbasafa
Copy link

hbasafa commented Nov 2, 2021

@patrickvonplaten
I'm in! I've searched this topic but it seems there is no official implementation on this topic, and It would be so nice to add this feature. If this feature is still in the backlog, I would be happy to contribute. Looking forward to your alert!

@patrickvonplaten
Copy link
Contributor

Hey @hbasafa,

I'm now working on this topic full time.

We will most likely foster a closer collaboration between pyctcdecode and Transformers. Here is a github repo that shows how to use pyctcdecode with Wav2Vec2 for LM supported decoding. It works quite well with KenLM.

@deepang17 deepang17 deleted the wav2vec2contribution branch November 4, 2021 17:35
@hbasafa
Copy link

hbasafa commented Nov 5, 2021

Nice one! I will check it out.

As I was in hurry, I've already used this code that could be easily installed via pip.
The code sample is also provided in here
And Now I also focused on other decoding strategies to add there.

However, Thank you for sharing!
@patrickvonplaten

@machakos23
Copy link

Hey @hbasafa,

I'm now working on this topic full time.

We will most likely foster a closer collaboration between pyctcdecode and Transformers. Here is a github repo that shows how to use pyctcdecode with Wav2Vec2 for LM supported decoding. It works quite well with KenLM.

hi @patrickvonplaten - this is great news. Where is the best place to follow your progress?

@patrickvonplaten
Copy link
Contributor

patrickvonplaten commented Nov 15, 2021

This PR: #14339

It all depends a bit on how fast we can merge a load_from_hf_hub function to pyctcdecode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Beam search decoding and language model integration for Wav2Vec2ForCTC models
8 participants