New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added mask_time_prob and mask_time_length arguments to wav2vec2 pretraining script and readme #19997
Added mask_time_prob and mask_time_length arguments to wav2vec2 pretraining script and readme #19997
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but would be good to get a review from @sanchit-gandhi as well here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mpierrau! Sorry for the long delay in getting you a final review. Thanks for digging into this and documenting the motivations so well - the change is most welcome!
Regarding your question:
if mask_time_prob should be split into two different variables to avoid confusion in the future
I would say 'no' to this. At heart, mask_time_prob
controls the same behaviour for pre-training and fine-tuning. It's simply the interpretation of this behaviour that is different:
- For pre-training, we're masking latents for our contrastive loss
- For fine-tuning, we're applying SpecAug in the latent space
-> this is the same behaviour for both (masking in the latent space), just different interpretations of what they mean for training
What we could do is clarify the meaning of this variable in the docs (with something similar to the above) - this would help clear the confusion without introducing extra variables and the potential for breaking changes!
examples/pytorch/speech-pretraining/run_wav2vec2_pretraining_no_trainer.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-pretraining/run_wav2vec2_pretraining_no_trainer.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-pretraining/run_wav2vec2_pretraining_no_trainer.py
Show resolved
Hide resolved
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hey! I'm not sure what the next step is here, like I said, this is my first PR :) Some checks failed, not sure why. All tests passed before I commited @sanchit-gandhi's suggestions, but it seems they are mainly failing due to timeouts? |
You will need to rebase your PR on main for the tests to pass, as your branch does not have the fixes for the last release of TensorFlow. |
Hey @mpierrau! Exciting to see that you've picked-up this PR again! Let me know if you need any final help - we're close to merging now! As Sylvain has mentioned, you'll need to rebase onto main to fix the failing tests (see 5. in this guide: https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request, just remember to force push as detailed 😉) |
…ingface#20277) * move contrastive search test to slow
* Fixing the doctests failures. * Fixup.
…in pytorch cpu/cuda amp autocast (huggingface#20289) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Add docstrings for canine model * Update CanineForTokenClassification Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add ResNetBackbone * Define channels and strides as property * Remove file * Add test for backbone * Update BackboneOutput class * Remove strides property * Fix docstring * Add backbones to SHOULD_HAVE_THEIR_OWN_PAGE * Fix auto mapping name * Add sanity check for out_features * Set stage names based on depths * Update to tuple Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
- simplifies the devce checking test
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix the doc to specify that add_prefix_space = False * add correct expected output
* [ASR Examples] Update README for seq2seq * add language info * add training results * re-word
* Add padding transformation * Add in upstream changes * Update tests & docs * Code formatting tuples in docstring
* Add AnyPrecisionAdamW optimizer * Add optim_args argument to TrainingArgs * Add tests for AnyPrecisionOptimizer * Change AnyPrecisionAdam default params to float32 * Move default_anyprecision_kwargs in trainer test * Rename AnyPrecisionAdamW
… consistency. (huggingface#20280) * [Proposal] Breaking change `zero-shot-object-detection` for improved consistency. This is a proposal to modify the output of `zero-shot-object-detection` to provide better alignment with other pipelines. The output is now strictly the same as `object-detection` whereas before it would output lists of lists. The name `candidate_labels` is used throughout for consistency with other `zero-shot` pipelines. The pipeline is changed to `ChunkPipeline` to support batching cleanly. This removes all the lists and list of lists shenanigans, it's now a matter of the base pipeline handling all this not this specific one. **Breaking change**: It did remove complex calls potentials `pipe(images = [image1, image2], text_queries=[candidates1, candidates2])` to support only `pipe([{"image": image1, "candidate_labels": candidates1}, {"image": image2, "candidate_labels": candidates2}])` when dealing with lists and/or datasets. We could keep them, but it will add a lot of complexity to the code base, since the pipeline is rather young, I'd rather break to keep the code simpler, but we can revert this. **Breaking change**: The name of the argument is now `image` instead of `images` since it expects by default only 1 image. This is revertable like the previous one. **Breaking change**: The types is now simplified and flattened: `pipe(inputs) == [{**object1}, {**object2}]` instead of the previous `pipe(inputs) == [[{**object1}, {**object1}], [{**object2}]]` Where the different instances would be grouped by candidate labels within lists. IMHO this is not really desirable, since it would output empty lists and is only adding superflous indirection compared to `zero-shot-object-detection`. It is relatively change free in terms of how the results, it does change computation however since now the batching is handled by the pipeline itself. It **did** change the results for the small models so there seems to be a real difference in how the models handle this. * Fixing the doctests. * Behind is_torch_available.
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* remove double brackets * oops get other bracket
* Add require_vision decorator * Fixup * Use requires_backends * Add requires_backend to utils functions
* Add templates for gpt-sw3 * Add templates for gpt-sw3 * Added sentencepiece tokenizer * intermediate commit with many changes * fixed conflicts * Init commit for tokenization port * Tokenization progress * Remove fast tokenizer * Clean up and rename spm.model -> spiece.model * Remove TF -> PT conversion script template, Clean up Megatron -> PT script * Optimize encode & decode performance * added new attention * added new attention * attention for gpt-sw3 working * attention good * Cache is now working * fixed attention mask so that it works with causal attention * fixed badbmm bug for cpu and caching * updated config with correct parameters * Refactor and leave optimizations as separate functions to avoid breaking expected functionality * Fix special tokens mapping for both tokenizers * cleaning up of code and comments * HF compatible attention outputs * Tokenizer now passing tests, add documentation * Update documentation * reverted back to base implementation after checking that it is identical to pretrained model * updated gpt-sw3 config * updated conversion script * aligned parameters with gpt-sw3 config * changed default scale_attn_by_inverse_layer_idx to true * removed flag from conversion script * added temporary model path * reverted back to functioning convert script * small changes to default config * updated tests for gpt-sw3 * make style, make quality, minor cleanup * Change local paths to testing online repository * Change name: GptSw3 -> GPTSw3 * Remove GPTSw3TokenizerFast references * Use official model repository and add more model sizes * Added reference to 6.7b model * Add GPTSw3DoubleHeadsModel to IGNORE_NON_AUTO_CONFIGURED, like GPT2DoubleHeadsModel * Remove pointers to non-existing TFGPTSw3 * Add GPTSw3 to docs/_toctree.yml * Remove TF artifacts from GPTSw3 in __init__ files * Update README:s with 'make fix-copies' * Add 20b model to archive list * Add documentation for GPT-Sw3 * Fix typo in documentation for GPT-Sw3 * Do 'make fix-copies' again after having updated docs * Fix some typos in docs * Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/gpt_sw3/test_tokenization_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Resolve comments from PR feedback * Resolve more comments from PR feedback, also set use_cache=True in convert script * Add '# Copied from' comments for GPTSw3 modeling * Set 'is_parallelizable = False' * Remove '# Copied from' where code was modified and add 'with x->y' when appropriate * Remove parallelize in mdx * make style, make quality * Update GPTSw3Config default values and corresponding documentation * Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Clean up and protect GPTSw3Tokenizer imports with is_sentencepiece_available * Make style, make quality * Add dummy object for GPTSw3Tokenizer via 'make fix-copies' * make fix-copies * Remove GPTSw3 modeling classes * make style, make quality * Add GPTSw3 auto-mappings for other GPT2 heads * Update docs/source/en/model_doc/gpt-sw3.mdx Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove old TODO-comment * Add example usage to GPTSw3Tokenizer docstring * make style, make quality * Add implementation details and example usage to gpt-sw3.mdx Co-authored-by: JoeyOhman <joeyoh@kth.se> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
…ingface#20731) * Disambiguate test for required_input in tokenization base file. * Add test for size
* Add decorator for flaky tests * Fix up
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* add `keep_in_fp32_modules` support * pass it as class attribute * few modifs - make tests `slow` - fix logic * better logic * fix failing test * `bfloat16` support * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix * simplify tests * simplify tests * fix test * modify message * more checks * fix failing tests * add more conditions - add `is_accelerate_available` - fixes pipleine tests that failed * add suggestions * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix failing `bnb` test * add last safety checker Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix the pipeline test regarding TF * Fix the pipeline test regarding TF * update comment Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix AdamWeightDecay for TF * Fix AdamWeightDecay for TF * make fixup
…ngface#20728) `image = to_channel_dimension_format(image, ChannelDimension.LAST)` is redundant as this same conversion is also applied in to_pil_image(). This redundant call actually makes the training fail in rare cases. The problem can be reproduced with the following code snippet: ``` from transformers.models.clip import CLIPFeatureExtractor vision_processor = CLIPFeatureExtractor.from_pretrained('openai/clip-vit-large-patch14') images = [ torch.rand(size=(3, 2, 10), dtype=torch.float), torch.rand(size=(3, 10, 1), dtype=torch.float), torch.rand(size=(3, 1, 10), dtype=torch.float) ] for image in images: processed_image = vision_processor(images=image, return_tensors="pt")['pixel_values'] print(processed_image.shape) assert processed_image.shape == torch.Size([1, 3, 224, 224]) ``` The last image has a height of 1 pixel. The second call to to_channel_dimesion_format() will transpose the image, and the height dimension is wrongly treated as the channels dimension afterwards. Because of this, the following normalize() step will result in an exception.
* Add first draft * Add out_features attribute to config * Add corresponding test * Add Dinat backbone * Add BackboneMixin * Add Backbone mixin, improve tests * Fix embeddings * Fix bug * Improve backbones * Fix Nat backbone tests * Fix Dinat backbone tests * Apply suggestions Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* added model resources for xlm-roberta * added model resources for xlm-roberta * resolve suggested changes * add resources to xlm-roberta
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
…ce#20758) Uninstall torch_tensorrt for now Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Improve tests * Improve TF tests * Apply suggestion * Fix test Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
fix: 修复Trainer无法使用use_legacy_prediction_loop参数的问题 解决使用use_legacy_prediction_loop参数在predict阶段使用prediction_loop进行预测时,遇到AttributeError: 'PredictionOutput' object has no attribute 'num_samples'的问题 Co-authored-by: ZhouHang <zhouhang@idataway.com>
* weight -> weights * model embedding resize does not work with both v2 and noraml * remove useless test
* Replaces xxx_required with requires_backends * Fixup
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add Swin backbone * Remove line * Add code example Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Even more validation. * Fixing order.
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Hey, sorry, I somehow managed to miss the force push flag anyways... I hope it still works? |
Hey @mpierrau! Unfortunately the commit history gets messed up after a rebase + non-force push. Not to worry though! Let's open a new PR with your changes in favour of this one. You can create a new branch and copy over the relevant file (
You can then commit, rebase, and force push to origin to open a new PR with just the required changes. |
Closing in favour of #20985. |
What does this PR do?
This PR was requested by @patrickvonplaten following my question and following discussion in the Discord ask-for-help channel under the title Wav2vec2 - why is mask_time_prob=0.05?
This PR adds the arguments
mask_time_prob
andmask_time_length
to theexamples/pytorch/speech-pretraining/run_wav2vec2_pretraining_no_trainer.py
script and the corresponding example use in theREADME.md
.mask_time_prob
is a variable describing two things, depending on context:In this script, we are considering it in the context of 1).
mask_time_length
is a variable describing the length (in # of frames frames) of each applied mask. It is added for completion.Background
In the original wav2vec 2.0 article, the variable
mask_time_prob
is set to0.65
, which (due to overlap) results in an effective masking of approximately 49% of the feature vectors during pretraining.mask_time_length
corresponds to the M variable in the article and is set to 10 there.However, when considering the config file of wav2vec2-base, one finds that
mask_time_prob=0.05
. This is because this model is usually used for finetuning, and not for (continued) pretraining, and for finetuning0.05
is a better hyperparameter value (see Appendix B of wav2vec 2.0 article). This is a bit confusing.By considering the config file of the
wav2vec2-base-v2
model, which was used during Patricks experimentation (see the speech-pretraining readme) one finds that indeedmask_time_prob=0.65
was used for pretraining.The values
0.65
and10
are set as default values for theDataCollatorForWav2Vec2Pretraining
class defined in the script (as this class may be extracted from the script by users), but no defaults are given in the argparser, as the argument values are also specified in the wav2vec2-base and wav2vec2-base-v2 model configs, and if setting defaults in the argparser, the model config values would never be applied, which may be desired. Hence, the parser argument will only be relevant if it is explicitly specified as an argument when executing the script.I believe this PR may also lift a bigger question, which is if
mask_time_prob
should be split into two different variables to avoid confusion in the future.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
Participants in discussion on Discord: @osanseviero @patrickvonplaten