Added mask_time_prob and mask_time_length arguments to wav2vec2 pretraining script and readme #19997

mpierrau · 2022-11-01T11:08:28Z

What does this PR do?

This PR was requested by @patrickvonplaten following my question and following discussion in the Discord ask-for-help channel under the title Wav2vec2 - why is mask_time_prob=0.05?

This PR adds the arguments mask_time_prob and mask_time_length to the examples/pytorch/speech-pretraining/run_wav2vec2_pretraining_no_trainer.py script and the corresponding example use in the README.md.

mask_time_prob is a variable describing two things, depending on context:

the percentage of the encoded feature vector to be masked during the contrastive learning task in pre-training
to imitate SpecAugment during fine-tuning

In this script, we are considering it in the context of 1).

mask_time_length is a variable describing the length (in # of frames frames) of each applied mask. It is added for completion.

Background

In the original wav2vec 2.0 article, the variable mask_time_prob is set to 0.65, which (due to overlap) results in an effective masking of approximately 49% of the feature vectors during pretraining. mask_time_length corresponds to the M variable in the article and is set to 10 there.

However, when considering the config file of wav2vec2-base, one finds that mask_time_prob=0.05. This is because this model is usually used for finetuning, and not for (continued) pretraining, and for finetuning 0.05 is a better hyperparameter value (see Appendix B of wav2vec 2.0 article). This is a bit confusing.

By considering the config file of the wav2vec2-base-v2 model, which was used during Patricks experimentation (see the speech-pretraining readme) one finds that indeed mask_time_prob=0.65 was used for pretraining.

The values 0.65 and 10 are set as default values for the DataCollatorForWav2Vec2Pretraining class defined in the script (as this class may be extracted from the script by users), but no defaults are given in the argparser, as the argument values are also specified in the wav2vec2-base and wav2vec2-base-v2 model configs, and if setting defaults in the argparser, the model config values would never be applied, which may be desired. Hence, the parser argument will only be relevant if it is explicitly specified as an argument when executing the script.

I believe this PR may also lift a bigger question, which is if mask_time_prob should be split into two different variables to avoid confusion in the future.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Participants in discussion on Discord: @osanseviero @patrickvonplaten

HuggingFaceDocBuilderDev · 2022-11-01T11:24:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

sgugger · 2022-11-01T11:59:41Z

cc @sanchit-gandhi

patrickvonplaten

This looks good to me, but would be good to get a review from @sanchit-gandhi as well here

sanchit-gandhi

Hey @mpierrau! Sorry for the long delay in getting you a final review. Thanks for digging into this and documenting the motivations so well - the change is most welcome!

Regarding your question:

if mask_time_prob should be split into two different variables to avoid confusion in the future

I would say 'no' to this. At heart, mask_time_prob controls the same behaviour for pre-training and fine-tuning. It's simply the interpretation of this behaviour that is different:

For pre-training, we're masking latents for our contrastive loss
For fine-tuning, we're applying SpecAug in the latent space

-> this is the same behaviour for both (masking in the latent space), just different interpretations of what they mean for training

What we could do is clarify the meaning of this variable in the docs (with something similar to the above) - this would help clear the confusion without introducing extra variables and the potential for breaking changes!

examples/pytorch/speech-pretraining/run_wav2vec2_pretraining_no_trainer.py

github-actions · 2022-12-08T15:01:58Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

mpierrau · 2022-12-08T19:21:59Z

Hey! I'm not sure what the next step is here, like I said, this is my first PR :) Some checks failed, not sure why. All tests passed before I commited @sanchit-gandhi's suggestions, but it seems they are mainly failing due to timeouts?

sgugger · 2022-12-08T19:27:19Z

You will need to rebase your PR on main for the tests to pass, as your branch does not have the fixes for the last release of TensorFlow.

sanchit-gandhi · 2022-12-12T15:51:35Z

Hey @mpierrau! Exciting to see that you've picked-up this PR again! Let me know if you need any final help - we're close to merging now!

As Sylvain has mentioned, you'll need to rebase onto main to fix the failing tests (see 5. in this guide: https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request, just remember to force push as detailed 😉)

…ingface#20277) * move contrastive search test to slow

* Fixing the doctests failures. * Fixup.

…in pytorch cpu/cuda amp autocast (huggingface#20289) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Add docstrings for canine model * Update CanineForTokenClassification Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add ResNetBackbone * Define channels and strides as property * Remove file * Add test for backbone * Update BackboneOutput class * Remove strides property * Fix docstring * Add backbones to SHOULD_HAVE_THEIR_OWN_PAGE * Fix auto mapping name * Add sanity check for out_features * Set stage names based on depths * Update to tuple Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

- simplifies the devce checking test

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix the doc to specify that add_prefix_space = False * add correct expected output

* [ASR Examples] Update README for seq2seq * add language info * add training results * re-word

* Add padding transformation * Add in upstream changes * Update tests & docs * Code formatting tuples in docstring

* Add AnyPrecisionAdamW optimizer * Add optim_args argument to TrainingArgs * Add tests for AnyPrecisionOptimizer * Change AnyPrecisionAdam default params to float32 * Move default_anyprecision_kwargs in trainer test * Rename AnyPrecisionAdamW

… consistency. (huggingface#20280) * [Proposal] Breaking change `zero-shot-object-detection` for improved consistency. This is a proposal to modify the output of `zero-shot-object-detection` to provide better alignment with other pipelines. The output is now strictly the same as `object-detection` whereas before it would output lists of lists. The name `candidate_labels` is used throughout for consistency with other `zero-shot` pipelines. The pipeline is changed to `ChunkPipeline` to support batching cleanly. This removes all the lists and list of lists shenanigans, it's now a matter of the base pipeline handling all this not this specific one. **Breaking change**: It did remove complex calls potentials `pipe(images = [image1, image2], text_queries=[candidates1, candidates2])` to support only `pipe([{"image": image1, "candidate_labels": candidates1}, {"image": image2, "candidate_labels": candidates2}])` when dealing with lists and/or datasets. We could keep them, but it will add a lot of complexity to the code base, since the pipeline is rather young, I'd rather break to keep the code simpler, but we can revert this. **Breaking change**: The name of the argument is now `image` instead of `images` since it expects by default only 1 image. This is revertable like the previous one. **Breaking change**: The types is now simplified and flattened: `pipe(inputs) == [{**object1}, {**object2}]` instead of the previous `pipe(inputs) == [[{**object1}, {**object1}], [{**object2}]]` Where the different instances would be grouped by candidate labels within lists. IMHO this is not really desirable, since it would output empty lists and is only adding superflous indirection compared to `zero-shot-object-detection`. It is relatively change free in terms of how the results, it does change computation however since now the batching is handled by the pipeline itself. It **did** change the results for the small models so there seems to be a real difference in how the models handle this. * Fixing the doctests. * Behind is_torch_available.

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* remove double brackets * oops get other bracket

* Add require_vision decorator * Fixup * Use requires_backends * Add requires_backend to utils functions

* Add templates for gpt-sw3 * Add templates for gpt-sw3 * Added sentencepiece tokenizer * intermediate commit with many changes * fixed conflicts * Init commit for tokenization port * Tokenization progress * Remove fast tokenizer * Clean up and rename spm.model -> spiece.model * Remove TF -> PT conversion script template, Clean up Megatron -> PT script * Optimize encode & decode performance * added new attention * added new attention * attention for gpt-sw3 working * attention good * Cache is now working * fixed attention mask so that it works with causal attention * fixed badbmm bug for cpu and caching * updated config with correct parameters * Refactor and leave optimizations as separate functions to avoid breaking expected functionality * Fix special tokens mapping for both tokenizers * cleaning up of code and comments * HF compatible attention outputs * Tokenizer now passing tests, add documentation * Update documentation * reverted back to base implementation after checking that it is identical to pretrained model * updated gpt-sw3 config * updated conversion script * aligned parameters with gpt-sw3 config * changed default scale_attn_by_inverse_layer_idx to true * removed flag from conversion script * added temporary model path * reverted back to functioning convert script * small changes to default config * updated tests for gpt-sw3 * make style, make quality, minor cleanup * Change local paths to testing online repository * Change name: GptSw3 -> GPTSw3 * Remove GPTSw3TokenizerFast references * Use official model repository and add more model sizes * Added reference to 6.7b model * Add GPTSw3DoubleHeadsModel to IGNORE_NON_AUTO_CONFIGURED, like GPT2DoubleHeadsModel * Remove pointers to non-existing TFGPTSw3 * Add GPTSw3 to docs/_toctree.yml * Remove TF artifacts from GPTSw3 in __init__ files * Update README:s with 'make fix-copies' * Add 20b model to archive list * Add documentation for GPT-Sw3 * Fix typo in documentation for GPT-Sw3 * Do 'make fix-copies' again after having updated docs * Fix some typos in docs * Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/gpt_sw3/test_tokenization_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Resolve comments from PR feedback * Resolve more comments from PR feedback, also set use_cache=True in convert script * Add '# Copied from' comments for GPTSw3 modeling * Set 'is_parallelizable = False' * Remove '# Copied from' where code was modified and add 'with x->y' when appropriate * Remove parallelize in mdx * make style, make quality * Update GPTSw3Config default values and corresponding documentation * Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Clean up and protect GPTSw3Tokenizer imports with is_sentencepiece_available * Make style, make quality * Add dummy object for GPTSw3Tokenizer via 'make fix-copies' * make fix-copies * Remove GPTSw3 modeling classes * make style, make quality * Add GPTSw3 auto-mappings for other GPT2 heads * Update docs/source/en/model_doc/gpt-sw3.mdx Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove old TODO-comment * Add example usage to GPTSw3Tokenizer docstring * make style, make quality * Add implementation details and example usage to gpt-sw3.mdx Co-authored-by: JoeyOhman <joeyoh@kth.se> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

…ingface#20731) * Disambiguate test for required_input in tokenization base file. * Add test for size

* Add decorator for flaky tests * Fix up

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* add `keep_in_fp32_modules` support * pass it as class attribute * few modifs - make tests `slow` - fix logic * better logic * fix failing test * `bfloat16` support * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix * simplify tests * simplify tests * fix test * modify message * more checks * fix failing tests * add more conditions - add `is_accelerate_available` - fixes pipleine tests that failed * add suggestions * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix failing `bnb` test * add last safety checker Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix the pipeline test regarding TF * Fix the pipeline test regarding TF * update comment Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix AdamWeightDecay for TF * Fix AdamWeightDecay for TF * make fixup

…ngface#20728) `image = to_channel_dimension_format(image, ChannelDimension.LAST)` is redundant as this same conversion is also applied in to_pil_image(). This redundant call actually makes the training fail in rare cases. The problem can be reproduced with the following code snippet: ``` from transformers.models.clip import CLIPFeatureExtractor vision_processor = CLIPFeatureExtractor.from_pretrained('openai/clip-vit-large-patch14') images = [ torch.rand(size=(3, 2, 10), dtype=torch.float), torch.rand(size=(3, 10, 1), dtype=torch.float), torch.rand(size=(3, 1, 10), dtype=torch.float) ] for image in images: processed_image = vision_processor(images=image, return_tensors="pt")['pixel_values'] print(processed_image.shape) assert processed_image.shape == torch.Size([1, 3, 224, 224]) ``` The last image has a height of 1 pixel. The second call to to_channel_dimesion_format() will transpose the image, and the height dimension is wrongly treated as the channels dimension afterwards. Because of this, the following normalize() step will result in an exception.

* Add first draft * Add out_features attribute to config * Add corresponding test * Add Dinat backbone * Add BackboneMixin * Add Backbone mixin, improve tests * Fix embeddings * Fix bug * Improve backbones * Fix Nat backbone tests * Fix Dinat backbone tests * Apply suggestions Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* added model resources for xlm-roberta * added model resources for xlm-roberta * resolve suggested changes * add resources to xlm-roberta

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…ce#20758) Uninstall torch_tensorrt for now Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Improve tests * Improve TF tests * Apply suggestion * Fix test Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

fix: 修复Trainer无法使用use_legacy_prediction_loop参数的问题解决使用use_legacy_prediction_loop参数在predict阶段使用prediction_loop进行预测时，遇到AttributeError: 'PredictionOutput' object has no attribute 'num_samples'的问题 Co-authored-by: ZhouHang <zhouhang@idataway.com>

* weight -> weights * model embedding resize does not work with both v2 and noraml * remove useless test

* Replaces xxx_required with requires_backends * Fixup

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add Swin backbone * Remove line * Add code example Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* Even more validation. * Fixing order.

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

mpierrau · 2022-12-15T14:37:12Z

Hey, sorry, I somehow managed to miss the force push flag anyways... I hope it still works?

sanchit-gandhi · 2022-12-15T15:23:59Z

Hey @mpierrau! Unfortunately the commit history gets messed up after a rebase + non-force push. Not to worry though! Let's open a new PR with your changes in favour of this one. You can create a new branch and copy over the relevant file (run_pretraining_...):

git checkout -b new-branch-mask-time-prob
git restore --source adding-mask_time_prob-args-to-wav2vec2-pretraining-script -- /path/to/relevant/file

You can then commit, rebase, and force push to origin to open a new PR with just the required changes.

sanchit-gandhi · 2023-01-05T16:20:02Z

Closing in favour of #20985.

osanseviero requested a review from patrickvonplaten November 1, 2022 13:38

patrickvonplaten approved these changes Nov 2, 2022

View reviewed changes

patrickvonplaten requested a review from sanchit-gandhi November 2, 2022 17:18

sanchit-gandhi approved these changes Nov 14, 2022

View reviewed changes

gante and others added 20 commits December 15, 2022 15:32

Generate: general TF XLA constrastive search are now slow tests (hugg…

29a59f9

…ingface#20277) * move contrastive search test to slow

Fixing the doctests failures. (huggingface#20294)

4c732a3

* Fixing the doctests failures. * Fixup.

set the default cache_enable to True, aligned with the default value …

5aff383

…in pytorch cpu/cuda amp autocast (huggingface#20289) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

Add docstrings for canine model (huggingface#19457)

f26e427

* Add docstrings for canine model * Update CanineForTokenClassification Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Add missing report button for Example test (huggingface#20293)

b132b80

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

refactor test (huggingface#20300)

ba23a07

- simplifies the devce checking test

[Tiny model creation] deal with ImageProcessor (huggingface#20298)

1c2b2b3

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix blender bot missleading doc (huggingface#20301)

7a63702

* fix the doc to specify that add_prefix_space = False * add correct expected output

remove two tokens that should not be suppressed (huggingface#20302)

4bb1eac

[ASR Examples] Update README for Whisper (huggingface#20230)

7bb727a

* [ASR Examples] Update README for seq2seq * add language info * add training results * re-word

Add padding image transformation (huggingface#19838)

baca2c1

* Add padding transformation * Add in upstream changes * Update tests & docs * Code formatting tuples in docstring

Pin TensorFlow (huggingface#20313)

90e95e1

Pin to the right version...

e35c976

Also pin TensorFlow CPU

e2bd841

Fix flakey test with seed (huggingface#20318)

78e64cf

Pin TF 2.10.1 for Push CI (huggingface#20319)

0e7bea9

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Remove double brackets (huggingface#20307)

56f5ce0

* remove double brackets * oops get other bracket

amyeroberts and others added 25 commits December 15, 2022 15:32

Add vision requirement to image transforms (huggingface#20712)

4d682a6

* Add require_vision decorator * Fixup * Use requires_backends * Add requires_backend to utils functions

Add a progress bar for large model loading (huggingface#20713)

ed58694

Disambiguate test for required_input in tokenization base file. (hugg…

71c6e75

…ingface#20731) * Disambiguate test for required_input in tokenization base file. * Add test for size

Add decorator for flaky Donut tests (huggingface#20739)

fc1a8e8

* Add decorator for flaky tests * Fix up

rename layoutlm_job to exotic_models_job (huggingface#20736)

7896580

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Update CI to torch 1.13.0 (huggingface#20687)

214bb6f

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Change a logic in pipeline test regarding TF (huggingface#20710)

f4d4d5f

* Fix the pipeline test regarding TF * Fix the pipeline test regarding TF * update comment Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix AdamWeightDecay for TF 2.11 (huggingface#20735)

edecaeb

* Fix AdamWeightDecay for TF * Fix AdamWeightDecay for TF * make fixup

Add docs xlm roberta (huggingface#20742)

ccf278c

* added model resources for xlm-roberta * added model resources for xlm-roberta * resolve suggested changes * add resources to xlm-roberta

Fixing the pipeline tutorial test (huggingface#20746)

19c41e9

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Uninstall torch_tensorrt in DeepSpeed CI image for now (huggingfa…

edaa863

…ce#20758) Uninstall torch_tensorrt for now Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Remove image_transforms functions from init (huggingface#20704)

3274267

Fix missing () in some usage of is_flaky (huggingface#20749)

4c87485

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

[Tests] Improve test_attention_outputs (huggingface#20701)

513d90a

* Improve tests * Improve TF tests * Apply suggestion * Fix test Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

[CI-Test] Fixes but also skips the mT5 tests (huggingface#20755)

ce40ff2

* weight -> weights * model embedding resize does not work with both v2 and noraml * remove useless test

Replaces xxx_required with requires_backends (huggingface#20715)

07e7deb

* Replaces xxx_required with requires_backends * Fixup

Install torch-tensorrt 1.3.0 for DeepSpeed CI (huggingface#20764)

9e57892

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Add Swin backbone (huggingface#20769)

032c6c0

* Add Swin backbone * Remove line * Add code example Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

Even more validation. (huggingface#20762)

17e997c

* Even more validation. * Fixing order.

Install vision for TF pipeline tests (huggingface#20771)

c583e1d

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

mpierrau mentioned this pull request Jan 3, 2023

Added mask_time_prob and mask_time_length arguments to wav2vec2 pretraining script #20985

Merged

sanchit-gandhi closed this Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added mask_time_prob and mask_time_length arguments to wav2vec2 pretraining script and readme #19997

Added mask_time_prob and mask_time_length arguments to wav2vec2 pretraining script and readme #19997

mpierrau commented Nov 1, 2022 •

edited

HuggingFaceDocBuilderDev commented Nov 1, 2022

sgugger commented Nov 1, 2022

patrickvonplaten left a comment

sanchit-gandhi left a comment

github-actions bot commented Dec 8, 2022

mpierrau commented Dec 8, 2022

sgugger commented Dec 8, 2022

sanchit-gandhi commented Dec 12, 2022

mpierrau commented Dec 15, 2022

sanchit-gandhi commented Dec 15, 2022 •

edited

sanchit-gandhi commented Jan 5, 2023

Added mask_time_prob and mask_time_length arguments to wav2vec2 pretraining script and readme #19997

Added mask_time_prob and mask_time_length arguments to wav2vec2 pretraining script and readme #19997

Conversation

mpierrau commented Nov 1, 2022 • edited

What does this PR do?

Background

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Nov 1, 2022

sgugger commented Nov 1, 2022

patrickvonplaten left a comment

Choose a reason for hiding this comment

sanchit-gandhi left a comment

Choose a reason for hiding this comment

github-actions bot commented Dec 8, 2022

mpierrau commented Dec 8, 2022

sgugger commented Dec 8, 2022

sanchit-gandhi commented Dec 12, 2022

mpierrau commented Dec 15, 2022

sanchit-gandhi commented Dec 15, 2022 • edited

sanchit-gandhi commented Jan 5, 2023

mpierrau commented Nov 1, 2022 •

edited

sanchit-gandhi commented Dec 15, 2022 •

edited