[Wav2Vec2] PyCTCDecode Integration to support language model boosted decoding #14339

patrickvonplaten · 2021-11-09T11:30:18Z

Draft to integrate pyctcdecode into 🤗 Transformers

This will is a short doc to explain all the important aspects of a possible integration of pyctcdecode into 🤗 Transformers

What is LM-boosted Decoding?

In LM-boosted Decoding an acoustic model (Wav2Vec2) in trained on some speech data and independently of this training a language model (e.g. KenLM n-gram) is trained on some text in the same language than the speech data. Then during evaluation, the language model supports the acoustic model in predicted the transcribed words via beam search decoding. To be more precise, the output (log-)probability matrix of the acoustic model - being a [timesteps x log-prob for each subword token] matrix - is fed into a beam search decoder and by means of a language model (P(subword token | prev subword token)), the overall best subword token sequence is chosen using a beam search algorithm.

Why do we need LM-boosted Decoding for Speech?

LM-boosted decoding is still the or one of the state-of-the-art approaches for ASR systems in terms of Word-error-rate (WER) performance. The other upcoming system is an end-to-end approach where the language model is learned together with the acoustic model. This approach includes:

Google's RNN-T Conformer: Here a RNN Transducer architecture is used where a language model is learned end-to-end with a powerful acoustic model, like Conformer.
Encoder-Decoder architecture this architecture is essentially like T5/Bart only that the encoder is an acoustic model. The decoder is then the corresponding language model

The advantage of LM-boosted decoding is:

more flexibility: acoustic model and language model are trained differently. E.g. it's totally possible to use our GPT2 implementation for LM-boosted decoding for speech in the future.
Usually lighter: very good results can be achieved just by using n-grams
Usually faster: Encoder-decoder and RNN-T usually have to do some kind of auto-regressive generation which is costly. This holds especially true on CPU.

The disadvantages are:

more hyperparameters to tune
more difficult to support on GPU. It's easier to build a highly optimized end-to-end pipeline on GPU with encoder-decoder (since everything is written in torch.nn).

Why pyctcdecode?

We could implement the whole CTC beam search algorithm ourselves in transformers or a separate library, but it would look very similar to already existing libraries and in the spirit of open-source it's usually better to together improve existing libraries instead of duplicating work. There are three libraries for CTC beam search decoding that I analysed:

https://github.com/flashlight/flashlight -> this is Facebook's library written in C++ and the library used by the Wav2Vec2 team. It's highly optimized, but only runs on CUDA (https://github.com/flashlight/flashlight/tree/main/bindings/python#dependencies), has quite some dependencies and is not easy to understand. It gives good results and is fast. It only works for PyTorch.
https://github.com/kensho-technologies/pyctcdecode -> this is a very young library by Kensho Technologies (only 112 stars and not that many pip installs), but the maintainers seem quite active and eager to grow the library. It worked well in my experiments (see here). Also, it has very few dependencies and is written in pure Python. It's much slower than 1.) on GPU (obviously, since it only supports CPU), but quite fast on CPU compared to other libraries. It doesn't rely on PT or Tensorflow, so it could serve us for both those frameworks and JAX as well.
https://github.com/parlance/ctcdecode -> this is an older library (573 stars) and was quite fast in my experiments (It's written in PyTorch & C++ kernels). However, I didn't manage to get good results (see here) and IMO the code is not very easy to read & no docs & no examples.

Given this analysis and that the spirit of transformers is readability and easy-to-contribute to, 2.) makes by far the most sense to be considered for an integration to transformers IMO. It would be great if we manage to collaborate well with https://github.com/kensho-technologies/pyctcdecode on design choices and integrations, but we can also in the worst-case scenario (if for some reason our vision differs too strongly from https://github.com/kensho-technologies/pyctcdecode) fork the repo and shape it to how we would need it - it has a MIT license. However, the library looks quite nice to me and I'm also confident that we can start a fruitful collaboration both pyctcdecode and we can profit from.

Integration into Hugging Face's `transformers`

A couple of important requirements for a nice integration with transformers are:

It fits well with the current API for Wav2Vec2 so that people can very easily switch from their current Wav2Vec2 setup to an improved version
Everything can be downloaded from the 🤗 or other easy-to-use storage systems for the most user-friendly experience.

Keeping in mind that LM boosted Decoding requires the output log-probs of the acoustic model (Wav2Vec2ForCTC) as well as a dictionary and a language model there are two clean ways of integrating the feature IMO:

1.) - We add a new Wav2Vec2CTCDecoder class that replaces the Wav2Vec2CTCTokenizer and can be used just as Wav2Vec2CTCTokenizer within Wav2Vec2Processor. Since this class would require the vocabulary of Wav2Vec2CTCTokenizer we would probably have to add a self.tokenizer = Wav2Vec2CTCTokenizer(...) attribute in Wav2Vec2CTCDecoder which would create a bit too much abstraction IMO (Wav2Vec2Processor -> Wav2Vec2CTCDecoder -> Wav2Vec2Tokenizer).
2.) - We add a new Wav2Vec2ProcessorWithLM class that replaces Wav2Vec2Processor. It essentially just adds a self.decoder = ... to Wav2Vec2Processor and the batch_decode() and decode() methods now run LM-boosted decoding instead of the previous "tokenizer-only" decoding.

=> IMO 2.) is the better approach as it requires less abstraction and is also "safer" in that we can simply say that Wav2Vec2ProcessorWithLM is an experimental class that can be used to replace Wav2Vec2Processor.
This PR implements more or less everything that is required on the transformers side for 2).

So the change in API that I'm aiming for would look as follows:

import torch
-from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
+from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM
from datasets import load_dataset

ds = load_dataset("common_voice", "es", split="test", streaming=True)

sample = next(iter(ds))

resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).n

model = Wav2Vec2ForCTC.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")
-processor = Wav2Vec2Processor.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")
+processor = Wav2Vec2ProcessorWithLM.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")

input_values = processor(resampled_audio, return_tensors="pt").input_values

with torch.no_grad():
    logits = model(input_values).logits

-prediction_ids = torch.argmax(logits, dim=-1)
-transcription = processor.batch_decode(prediction_ids)
+transcription = processor.batch_decode(logits.cpu().numpy()).text

print(transcription)

Thinking a bit ahead here, IMO it would also be totally fine to have both a Wav2Vec2Processor and a Wav2Vec2ProcessorWithLM work correctly with an AutoProcessor class. We could just add a new processor_type attribute to the config.json so that the correct processor class is loaded depending on the config.json of the model. We could use a similar general design (ideally even a bit cleaner) as is used here.

Feature additions to pyctcdecode for target API

It would be great if together with pyctcdecode we could add an optional from_from_hf_hub(...) functionality for their BeamSearchDecoder class(es). This should be pretty simple to do with huggingface_hub and should also in general make it much easier for pyctcdecode to load and save models online (for free). This is to be discussed.

In a first step, it would be easiest to focus on fully supporting download and upload of KenLM language models for seamless KenLM-ngram boosted decoding. KenLM-ngram boosted decoding yielded some nice improvements in my experiments here

In a next step, we could then look into support for transformer LM models in PyCTCDecode (make pyctcdecode's beam search compatible with our AutoModelForCausalLM models) and also add load_from_hub(...) functionality for this in pyctcdecode.

Other possible improvements could include:

Timesteps prediction per word (outputting an exact time stamp for each predicted word, given the logits and sampling rate of the model
Audio frame to word alignment ...

sgugger

I like the design as suggested. Just have one comment on the save_pretrained method.

sgugger · 2021-11-09T14:41:53Z

src/transformers/models/wav2vec2/processing_wav2vec2.py

+        """
+
+        self.feature_extractor.save_pretrained(save_directory)
+        self.tokenizer.save_pretrained(save_directory)


The decoder should also be saved here, I think.

LysandreJik

Thanks for writing down your thoughts. I agree with everything said here, and pyctcdecode looks like a great tool, happy to use it.

I would not advocate for a Wav2Vec2ProcessorWithLM, however. I'd vote to have a single Wav2Vec2Processor instead. I guess you decided to split them as you didn't want to add the unnecessary overhead to Wav2Vec2Processor for users that did not want to use the language model?
In that case, I'd favor either:

Loading the LM on the fly when calling the decode method with the language model (maybe as a new decode_with_lm method?).
Passing an additional argument with_lm_decoding (either to the __init__, to prevent on-the-fly instantiation, or to the decode)

I personally think it would be less awkward from a user's perspective to have everything bundled in a single processor.

Overall, super down and excited for this PR! Let's get those improvements on WER!

sgugger · 2021-11-09T19:47:01Z

Not convinced by grouping everything together in one class which will sometimes have an additional method that works and sometimes not (depending on the env). The code is probably also going to be hard to read if all the imports have to be contained in a decode_with_lm.

anton-l

Taking previous suggestions into account, I think combining CTC decoders with the existing Wav2VecProcessor is a good idea. This workflow seems pretty natural to me:

processor = Wav2VecProcessor(feat_extractor, tokenizer)
processor.decode(logits)
>> Warning: `lm_decoder` is not specified, using a greedy decoder

lm_decoder = Wav2Vec2LMDecoder("kenlm/librispeech-100h-4gram")
processor = Wav2Vec2Processor(feat_extractor, tokenizer, lm_decoder=lm_decoder)
processor.decode(logits)

(sneaky suggestion to rename ctc_decoder here :))

anton-l · 2021-11-10T12:52:28Z

src/transformers/models/wav2vec2/processing_wav2vec2.py

+        return self.ctc_decoder.decode(logits.numpy())
+
+    @contextmanager
+    def as_target_processor(self):


Since the processor now has 3 modes of operation, maybe this design can be deprecated in favor of encode() and decode()?

patrickvonplaten · 2021-11-12T11:45:19Z

Regarding the PyCTCDecode integration I think it makes sense to start step-by-step and simple add a load_from_hub(...) method to KenLM's LanguageModel class as shown here: kensho-technologies/pyctcdecode#32

Once this PR is approved we can move forward with this PR.

patrickvonplaten · 2021-11-12T12:20:07Z

Thanks a lot for the feedback regarding the design choices @sgugger, @LysandreJik and @anton-l . I agree more with @sgugger here, but I think both implementations are valid and have pro/cons.

To give some more background for a better design decision:

The language model should definitely not be loaded on the fly -> this is more or less equivalent to loading a gpt2 sized model and every forward pass which would often take longer than the forward pass itself.
=> So the only possible design for putting everything under Wav2Vec2Processor would be something similat to what was stated by @anton-l being:

processor = Wav2Vec2Processor(feat_extractor, tokenizer, language_model: pyctcdecode.LanguageModel)

with pyctcdecode.LanguageModel being this class here: https://github.com/kensho-technologies/pyctcdecode/blob/b4c6a590b729303772604fba12118fad50326f3f/pyctcdecode/language_model.py#L184

IMO, the main reason why I think a new class would be better is because the class will be very experimental and will most likely change in the future (add other backend libraries, other language models, ...). Langauge model support for decoding is by no means always necessary or needed for ASR, so I can see lots of people just keep using Wav2Vec2Processor. In this case I don't think it's nice to add a lot of complex code to the existing class, but would prefer to keep it lean and simple to understand.
...WithLM will evolve in the future and in order to support more language models require complexer code and a couple of libraries to be installed. Think it's better to move all that extra code to a new class.

Some other reasons why I prefer Wav2Vec2ProcessorWithLM are:

Adding lots of things under the hood to Wav2Vec2Processor goes IMO a bit against our philosophy of having classes be "barebone" and not doing any "magic" under the hood
People using Wav2Vec2ProcessorWithLM are expected to provide a LM model which saves us some complex if else code and makes everything more readable.
Wav2Vec2ProcessorWithLM provides all the functionalities of Wav2Vec2Processor so there is never a case where one would need both processors
In terms of user experience, it boils down to the following in my opinion.
-> Do I want to decode with a language model? Let's use Wav2VecProcessorWithLM
-> No language model? Let's use Wav2Vec2Processor

Both classes would have the exact same API and can be replaced one-by-one. So for users wanting to decode with a language model everything is bundled in a single class, namely Wav2VecProcessorWithLM.

I don't see the huge advantage of having only a single Wav2Vec2Processor. In general in speech we will always have multiple classes for the same task, e.g.

...ForConditionalGeneration, ...ForCTC, ...ForRNNT, etc... can all be used with the automatic speech recognition pipeline
Wav2Vec2 will have multiple tokenizers (one for phonemes, one for characters only, ...)

LysandreJik · 2021-11-12T15:04:56Z

When I said on the fly, I meant it would be loaded the first time; every subsequent operation would use the previously loaded model.

But I don't have a strong opinion, and I understand your perspective. Good for me to go with the new class!

anton-l · 2021-11-15T16:41:39Z

After looking through the previous PRs conserning Wav2Vec2Processor, I understand why having a separate Wav2Vec2ProcessorWithLM can be more convenient.

But I'm interested in discussing (perhaps not in this PR) how we can evolve the current processing design, since only the feature extractor is universally required for speech models now, and the tokenizer and LM can be applied separately, depending on the target task.

patrickvonplaten · 2021-12-02T18:00:13Z

Real world demo example for a SOTA spanish wav2vec2 model: https://huggingface.co/patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm

-> seems to give a nice 10-20% WER improvement

patrickvonplaten · 2021-12-03T15:28:30Z

Final user API:

import torch
import torchaudio.functional as F
-from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
+from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM
from datasets import load_dataset

ds = load_dataset("common_voice", "es", split="test", streaming=True)

sample = next(iter(ds))

resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).n

model = Wav2Vec2ForCTC.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")
-processor = Wav2Vec2Processor.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")
+processor = Wav2Vec2ProcessorWithLM.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")

input_values = processor(resampled_audio, return_tensors="pt").input_values

with torch.no_grad():
    logits = model(input_values).logits

-prediction_ids = torch.argmax(logits, dim=-1)
-transcription = processor.batch_decode(prediction_ids)
+transcription = processor.batch_decode(logits.cpu().numpy()).text

print(transcription)

patrickvonplaten · 2021-12-03T15:43:46Z

src/transformers/models/wav2vec2/processing_wav2vec2_with_lm.py

+
+
+@dataclass
+class Wav2Vec2DecoderWithLMOutput(ModelOutput):


I'm planning on provided more outputs in the future for time-stamped word outputs, etc...

From what I understood from your comment here #14487 (review) you'd rather we not abstract model outputs, which I understand. I think this should be the case here too, right?

Ah, just understood that text was a single output and not an overload of a previously defined output. You can ignore my comment :)

patrickvonplaten · 2021-12-03T15:44:25Z

src/transformers/models/wav2vec2/processing_wav2vec2_with_lm.py

+                cls._set_language_model_attribute(decoder, attribute, value)
+
+        # make sure that decoder's alphabet and tokenizer's vocab match in content
+        missing_decoder_tokens = cls.get_missing_alphabet_tokens(decoder, tokenizer)


aggressive check to make sure the model's vocabulary matches the decoder's alphabet

sgugger

Thanks for adding this! I've left some comments as there is some cleaning up to do in some docstrings, the setup and the install instructions in the various CI jobs.

sgugger · 2021-12-06T03:31:07Z

.circleci/config.yml

@@ -83,6 +83,7 @@ jobs:
            - run: pip install .[sklearn,tf-cpu,torch,testing,sentencepiece,torch-speech,vision]
            - run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
            - run: pip install tensorflow_probability
+            - run: pip install https://github.com/kpu/kenlm/archive/master.zip


What is this package? The import error indicate to do pip install pyctcdecode later on and do not give any instruction to install this.

pyctcdecode optionally depends on kenlm if the user would like to use a kenlm language model. In the future, there will probably be more language models that don't require kenlm.

So IMO, it's the responsibility of the pyctcdecode package to throw a good error in case a user requests pyctcdecode with a kenlm language model. However since at the moment the only language model support is based on kenlm I can also throw a nice error message on our side.

Is there a plan to add a realy Python package for kenlm? This is a bit heavy :-(

Hmm, I'm really not sure the kenlm repo doesn't seem to be super active: https://github.com/kpu/kenlm .

It is however the by far most library for language model supported ASR.
Flashlight uses it: https://github.com/flashlight/flashlight/tree/main/bindings/python#dependencies among many other libraries.

Espnet uses it as well: https://github.com/espnet/espnet/blob/master/espnet/nets/scorers/ngram.py

sgugger · 2021-12-06T03:33:25Z

.circleci/config.yml

@@ -281,6 +287,7 @@ jobs:
            - run: pip install --upgrade pip
            - run: pip install .[sklearn,tf-cpu,testing,sentencepiece,tf-speech,vision]
            - run: pip install tensorflow_probability
+            - run: pip install https://github.com/kpu/kenlm/archive/master.zip


Do we need it in the TF tests?

It's a processor and therefore framework independent - it's written in pure Python.

.circleci/config.yml

sgugger · 2021-12-06T03:34:04Z

.circleci/config.yml

@@ -701,6 +717,7 @@ jobs:
                      - v0.4-{{ checksum "setup.py" }}
            - run: pip install --upgrade pip
            - run: pip install .[torch,testing,sentencepiece,onnxruntime]
+            - run: pip install https://github.com/kpu/kenlm/archive/master.zip


Why do we need it in the ONNX tests?

removed it from ONNX

sgugger · 2021-12-06T03:36:30Z

docs/source/model_doc/wav2vec2.rst

@@ -143,3 +136,19 @@ FlaxWav2Vec2ForPreTraining

 .. autoclass:: transformers.FlaxWav2Vec2ForPreTraining
    :members: __call__
+
+Wav2Vec2 specific outputs


In all other doc pages, those go before the models, so let's leave them here for now. We can decide to switch them after the model but all models together?

Sounds good - reverting this

src/transformers/models/wav2vec2/processing_wav2vec2_with_lm.py

tests/test_modeling_wav2vec2.py

.circleci/config.yml

…onplaten/transformers into pyctcdecode_integration

anton-l

Great integration! Although I feel like pyctcdecode's "magic options" can be documented a bit more verbosely, so I left some suggestions :)

src/transformers/models/wav2vec2/processing_wav2vec2_with_lm.py

anton-l · 2021-12-06T19:36:53Z

src/transformers/models/wav2vec2/processing_wav2vec2_with_lm.py

+    def get_missing_alphabet_tokens(decoder, tokenizer):
+        # we need to make sure that all of the tokenizer's except the special tokens
+        # are present in the decoder's alphabet. Retrieve missing alphabet token
+        # from decoder
+        tokenizer_vocab_list = [t.lower() for t in tokenizer.get_vocab().keys()]


t.lower() won't allow us to have an all-uppercase vocab (e.g. the official LibriSpeech LMs for eval are uppercase https://www.openslr.org/11)

Totally right! Thanks a lot for catching this! Not sure what I was thinking here :D

src/transformers/models/wav2vec2/processing_wav2vec2_with_lm.py

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

…onplaten/transformers into pyctcdecode_integration

patrickvonplaten · 2021-12-06T23:19:16Z

Great integration! Although I feel like pyctcdecode's "magic options" can be documented a bit more verbosely, so I left some suggestions :)

Yes, we should definitely make a notebook about it

patrickvonplaten · 2021-12-07T01:06:00Z

@LysandreJik @sgugger - think it's ready for a final review. I've now made sure that this LM boosted ASR is only tested for the TF, Flax and PT tests, but not for the ONNX & Hub tests. I've also added integration tests for TF and Flax.

@sgugger I don't really see how to get rid of pip install https://github.com/kpu/kenlm/archive/master.zip sadly

sgugger · 2021-12-07T01:11:29Z

Thanks for containing the addition of kenlm.

The documentation on how to run the tests locally should get an update as we are very very far now from just needing pip install transformers[testing], and we should maybe add comments on what each of those install lines is for in the config for circleCI/workflows for GitHub as I feel we're getting lost in the endless dependencies. It's fine as long as it runs but the day something breaks...

LysandreJik

LGTM, great work @patrickvonplaten

LysandreJik · 2021-12-08T08:43:49Z

src/transformers/models/wav2vec2/processing_wav2vec2_with_lm.py

+
+
+@dataclass
+class Wav2Vec2DecoderWithLMOutput(ModelOutput):


From what I understood from your comment here #14487 (review) you'd rather we not abstract model outputs, which I understand. I think this should be the case here too, right?

docs/source/model_doc/wav2vec2.rst

…decoding (huggingface#14339) * up * up * up * make it cleaner * correct * make styhahalal * add more tests * finish * small fix * make style * up * tryout to solve cicrle ci * up * fix more tests * fix more tests * apply sylvains suggestions * fix import * correct docs * add pyctcdecode only to speech tests * fix more tests * add tf, flax and pt tests * add pt * fix last tests * fix more tests * Apply suggestions from code review * change lines * Apply suggestions from code review Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * correct tests * correct tests * add doc string Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

patrickvonplaten added 3 commits November 9, 2021 11:29

up

6de1445

up

b68faa9

up

8294efa

patrickvonplaten requested review from LysandreJik, sgugger, anton-l and thomwolf November 9, 2021 13:12

sgugger approved these changes Nov 9, 2021

View reviewed changes

LysandreJik reviewed Nov 9, 2021

View reviewed changes

anton-l reviewed Nov 10, 2021

View reviewed changes

patrickvonplaten mentioned this pull request Nov 15, 2021

Added Feature: Prefix decoding for wav2vec2 models #11606

Closed

5 tasks

patrickvonplaten added 2 commits December 2, 2021 12:06

make it cleaner

6ec01c2

correct

52afd82

patrickvonplaten added 3 commits December 2, 2021 19:10

make styhahalal

e3b0fde

add more tests

e7eb51c

finish

ff0de09

small fix

6296938

patrickvonplaten changed the title ~~[PyCTCDecode Integration] Design proposal for PyCTCDecode integration~~ [Wav2Vec2] PyCTCDecode Integration to support language model boosted decoding Dec 3, 2021

patrickvonplaten commented Dec 3, 2021

View reviewed changes

patrickvonplaten added 3 commits December 3, 2021 17:24

make style

84bfdf3

up

4caf406

tryout to solve cicrle ci

d59b594

patrickvonplaten requested review from LysandreJik and anton-l December 3, 2021 19:15

sgugger approved these changes Dec 6, 2021

View reviewed changes

patrickvonplaten added 9 commits December 6, 2021 10:16

apply sylvains suggestions

f3648f6

fix import

f39f02c

correct docs

19a1301

add pyctcdecode only to speech tests

88783e3

fix more tests

51f3dc7

add tf, flax and pt tests

ceb6ea2

add pt

e2b19af

fix last tests

b1ba5dd

fix more tests

a52f319

patrickvonplaten commented Dec 6, 2021

View reviewed changes

.circleci/config.yml Outdated Show resolved Hide resolved

Apply suggestions from code review

66dd6d8

patrickvonplaten commented Dec 6, 2021

View reviewed changes

.circleci/config.yml Outdated Show resolved Hide resolved

patrickvonplaten added 2 commits December 6, 2021 19:34

change lines

d9cdb5e

Merge branch 'pyctcdecode_integration' of https://github.com/patrickv…

b93b954

…onplaten/transformers into pyctcdecode_integration

anton-l approved these changes Dec 6, 2021

View reviewed changes

patrickvonplaten and others added 4 commits December 6, 2021 22:17

Apply suggestions from code review

0fe15e1

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

correct tests

2382b92

Merge branch 'pyctcdecode_integration' of https://github.com/patrickv…

8e70208

…onplaten/transformers into pyctcdecode_integration

correct tests

b46df6b

LysandreJik approved these changes Dec 8, 2021

View reviewed changes

add doc string

776d152

patrickvonplaten merged commit 961732c into huggingface:master Dec 8, 2021

patrickvonplaten deleted the pyctcdecode_integration branch December 8, 2021 11:07

[Wav2Vec2] PyCTCDecode Integration to support language model boosted decoding #14339

[Wav2Vec2] PyCTCDecode Integration to support language model boosted decoding #14339

Conversation

patrickvonplaten commented Nov 9, 2021 • edited Loading

Draft to integrate pyctcdecode into 🤗 Transformers

What is LM-boosted Decoding?

Why do we need LM-boosted Decoding for Speech?

Why pyctcdecode?

Integration into Hugging Face's transformers

Feature additions to pyctcdecode for target API

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

sgugger commented Nov 9, 2021

anton-l left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten commented Nov 12, 2021

patrickvonplaten commented Nov 12, 2021

LysandreJik commented Nov 12, 2021 • edited Loading

anton-l commented Nov 15, 2021 • edited Loading

patrickvonplaten commented Dec 2, 2021

patrickvonplaten commented Dec 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anton-l left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten commented Dec 6, 2021

patrickvonplaten commented Dec 7, 2021

sgugger commented Dec 7, 2021

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten commented Nov 9, 2021 •

edited

Loading

Integration into Hugging Face's `transformers`

anton-l left a comment •

edited

Loading

LysandreJik commented Nov 12, 2021 •

edited

Loading

anton-l commented Nov 15, 2021 •

edited

Loading

patrickvonplaten commented Dec 3, 2021 •

edited

Loading