Update with forked from #1

* Update expect output values - as Hub repo. files are updated * Update expect output values - as librosa is from 0.9.2 to 0.10.0 on CI docker * fix * update one more --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix bug * forward contrib credits from discussions * change logic --------- Co-authored-by: edbeeching <edbeeching@users.noreply.github.com>

* Ran Black formatting * Added imports and reformatted * Update src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* troubleshooting guide: added an error description for missing auto-mapping * minor polishing * changed the example * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/troubleshooting.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [deepspeed tests] fix issues introduced by #21700 * fix * fix

* Removed useless check for backend * fix style check for graphormer * Reverted change and corrected requires_backend for cython * code qual

#21612) * fix: Change is_last chunk calc and add conditional break * format fix * account for 0 and full stride_rights, add comment * add new test * make style * update slow whisper asr test timestamps * use nested_simplify on output and round timestamp to hundreths place

* [flax] adding support for batch norm layers * fixing bugs related to pt+flax integration * cleanup, batchnorm support in sharded pt to flax * support for batchnorm tests in pt+flax integration * simplifying checking batch norm layer

…1756) * [Examples] Generalise run audio classification for log-mel models * batch feature extractor * make style

* Different behavior in DistilBERT when using "inputs_embeds" Fixes #21089 * fix failing test

* Return and rescale attention_mask * Add SpecAugment to Whisper modeling * Fix test * Update docstring * Add SpecAug related parameters to model config * Add the _mask_input_features function to doc * Fix quality * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove dev comments * Add test * Resolve conflict * feat: mask {feature, time} prob fast tests * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: sanchit-gandhi <sanchit@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix history * input_features instead of input ids for TFWhisport doctest * use translate intead of transcribe

…g and gradient checkpointing (#21759)

* updated expected * prediction_length fix * prediction_length default value * default prediction_length 24 * revert back prediction_length default * move prediction_length test

* fix gradient checkpointing bug * fix gradient checkpointing bug * ran make fix-copies * fixed bug * fixed bug

* Fix resume_from_checkpoint for deepspeed Fix resume_from_checkpoint for deepspeed, by ensuring that the deepspeed engine is the one to load the checkpoint. * Empty commit to trigger CI * Removed deepspeed skipping Removed deepspeed skipping inside the _load_from_checkpoint function, as it is obsolete * another adjustment * Trigger CI * trigger circleci * style --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Stas Bekman <stas@stason.org>

* Override the decoding parameters of Seq2SeqTrainer * Fix quality * Fix max_length parameter * Fix quality * Remove redundant parameter max_length * Separate the preprocess of train and validation to use different max_target_length

Fix docstring gpt2 config

* fix wrong url * typos in english documentation

make concrete_args from outside available

* add pipeline * update init * add zero shot to init * update inits and correct checkpoints * update base to support input features * add tests * Update src/transformers/pipelines/zero_shot_audio_classification.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/pipelines/zero_shot_audio_classification.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * update pieline code * use tiny checkpoint * nits and expected value with tiny model * style * last nit on tests values * fix styling * fix collate fn that was casting t float * update --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* uint8 -> bool * fix copies * style * update test modeling commen when checking attention buffers * style * use logical not on random mask instead of subtraction with 1 * remove torch uint8 * quality * remove modified modeling utils * Update based on review Co-authored-by: sgugger <sylvain.gugger@gmail.com> --------- Co-authored-by: sgugger <sylvain.gugger@gmail.com>

* add `accelerate` marker * add to docs * Update docs/source/en/testing.mdx

…21787) * fix perceiver fp16 * hopefully fix tests

fix nn.init.trunc_normal_ call on half data

* Fix gradient checkpointing bug in gptneox * Remove use_cache block

fix quality with ruff 0.0.253 Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

#21804) * logger.warning_once * style

Let's give TF a bit more love ❤️ 🙏 Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Co-authored-by: saswatmeher <saswatmeher@cse.iitb.ac.in>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix gradient checkpointing bug in gptneox * Fix gradient checkpointing bug in modeling_imagegpt.py * Revert gpt neox changes --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix tf random mask tokens probability * fix tf random mask tokens probability in collator for langauge modelling

* fix torchquant issue * add tests

* add v1 * add `Blip2Model` - add relevant functions - add tests - add on automapping * fix docs * fix doctest

… provided. (#21811) * Fix the issue of blip model returning loss even when the label is not provoided * Fix ruff failure * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks

* If applied, this commit fixes generate bug in gptj * Remove extra same code block * formatting and test fix * Conflict fix and declaration error fix --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* zero shot object detection part 1 * added batch prediction section * added image guided object detection section * make style * added the task guide to the TOC * minor polishing * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com> * added embedded owlvit demo * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * minor fix * make style --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Use token * Avoid failure * better error * Fix * fix style --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix blip multi gpu * fix * final changes * adapt suggestions * fix failing slow test * forward contrib credits from testing and suggestions * reformat --------- Co-authored-by: akkikiki <akkikiki@users.noreply.github.com>

…rieval (#21684) * Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval * minor fix return_dict * implement test for loss computation --------- Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com> Co-authored-by: Tiep Le <tiep.le@intel.com>

* Add PipelineTesterMixin * remove class PipelineTestCaseMeta * move validate_test_components * Add for ViT * Add to SPECIAL_MODULE_TO_TEST_MAP * style and quality * Add feature-extraction * update * raise instead of skip * add tiny_model_summary.json * more explicit * skip tasks not in mapping * add availability check * Add Copyright * A way to diable irrelevant tests * update with main * remove disable_irrelevant_tests * skip tests * better skip message * better skip message * Add all pipeline task tests * revert * Import PipelineTesterMixin * subclass test classes with PipelineTesterMixin * Add pipieline_model_mapping * Fix import after adding pipieline_model_mapping * Fix style and quality after adding pipieline_model_mapping * Fix one more import after adding pipieline_model_mapping * Fix style and quality after adding pipieline_model_mapping * Fix test issues * Fix import requirements * Fix mapping for MobileViTModelTest * Update * Better skip message * pipieline_model_mapping could not be None * Remove some PipelineTesterMixin * Fix typo * revert tests_fetcher.py * update * rename * revert * Remove PipelineTestCaseMeta from ZeroShotAudioClassificationPipelineTests * style and quality * test fetcher for all pipeline/model tests --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* First commit for the improved PT-TF weight loading * Remove workarounds from TFEncoderDecoder tests * Allow a custom weight renaming function in from_pretrained and use that to clean up EncoderDecoder * make fixup * First attempt at visionencoderdecoder * Disable tensorfloat32 in tests to get consistent outputs * Quick fix to tf_vision_encoder_decoder tests * make fixup * Update Blenderbot tests * Remove unused arg in modeling_tf_opt * load_tf_sharded_weights had strict=True! This meant transfer learning was impossible, so I'm setting it to False. * Support prefixes when loading sharded TF checkpoints * make fixup * Add test to load sharded models with a weight prefix * Fix sharded weight loading test * Add a test for transfer from a sharded checkpoint * make fixup * Add test to check that crossloading from PT with a prefix works * Refactor from_pretrained in the encoderdecoder classes * Refactor from_pretrained in the encoderdecoder classes * missmatched -> mismatched * Explicitly check for None * No comments showing my very impressive and attractive knowledge of Py3.9+ * Disable TF32 across all TF tests

* Fix flaky test for log level * Fix other flaky test

…in a future version of pytorch" (#20211) * rounding_mode = "floor" instead of // to prevent behavioral change * add other TODO * use `torch_int_div` from pytrch_utils * same for tests * fix copies * style * use relative imports when needed * Co-authored-by: sgugger <sylvain.gugger@gmail.com>

* fix reshaping Fixes #21523 * add test * styling * last fixes * Update src/transformers/models/convbert/modeling_convbert.py * code quallity

Co-authored-by: saswatmeher <saswatmeher@cse.iitb.ac.in>

…ut type (#21800) * trying to figure out whether model is NLP * drop my changes and apply easier fix * trying to handle all int input types * fix logic --------- Co-authored-by: Stas Bekman <stas@stason.org>

…shape) (#21860) * Change the .view call to .reshape * Change the .view call to .reshape to all the copies from bart attention * Fix copies and style * Fix copies and style * Fix copies and style * Fix copies and style * Fix copies and style * Revert unneccessary changes * Revert unneccessary changes * Revert unneccessary changes * Revert unneccessary changes

Italian translation of community.mdx gh-17459

fix blip doctest

removed BLIP mention from the troubleshooting guide

* update FSDP and add XLA-FSDP documentation * resolving comments * minor update * fix xla-fsdp docs

* Add an utility file to get information from test files --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add check for different embedding types in examples * Correctly update summarization example

…Conv1D's weights (#21879) apply normal_ after assigning weight as nn.Parameter to avoid unnecessary initialization computation

* Temporary commit to stash everything so far * Temporary commit to stash everything so far * stash commit * Refactor from_pretrained * Fix final test, make fixup * Update dummies * Add model to TEST_FILES_WITH_NO_COMMON_TESTS * Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Add TFVisionTextDualEncoder to utils/documentation_tests.txt * make fixup --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

Adds the ALIGN model to transformers. ALIGN is introduced in "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision" by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.

Co-authored-by: saswatmeher <saswatmeher@cse.iitb.ac.in>

* force on the same device * fix tests --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix tests --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* initial commit * update * second batch * style * fix imports * fix relative import on pipeline

add correct revision after model was overwritten

* Use PyAV instead of Decord * Get frame indices * Fix number of frames * Update src/transformers/models/videomae/image_processing_videomae.py * Fix up * Fix copies * Update timesformer doctests * Update docstrings

* initial commit to add inputs_embeds to generation * formatting

* Confusing documentation in T5 * Fix onfusing documentation in T5 configuration file

* add `zero_mean_unit_var_norm` function * normalize before MEL computation * fixup * add simple test * quality * Update tests/models/whisper/test_feature_extraction_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * fixup * use attention masks if padding was applied * Update based on review Co-authored-by: bofeng huang <bofenghuang7@gmail.com> --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: bofeng huang <bofenghuang7@gmail.com>

* add deprecation warning * remove pos ids from args docstirng * fix failing test

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Mark pipeline tests to skip them easily * Mark the mixin as pipeline test * Update src/transformers/testing_utils.py Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* add new test * fix after new test --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* decoder forward pass is working * no model has forward pass returning attentions * decoder ngram changed to not mix batch size * current basic forward pass returns identical result * passed test_model attentions * passed test_encoder_decoder_model_generate * passed test_headmasking * removed old block * removed comments bug/fixme * removed bug comments * applied styling * applied fix-copies * applied ngram forward comments * corrected dimension notation * applied styling and comment fixes * changed asserts for raise ValueError * changed question gen test * updated hidden_states integration test * applied styling

* Make schedulers picklable by making lr_lambda fns global * add unused _get_constant_schedule_lr_lambda arg * remove unneeded _get_constant_schedule_lr_lamda * add test * make style * rebase, remove torch dep, put lambda back * repo-consistency and style

* [WIP] whisper refacto to support language output. * Handling merges. * A bit more cleanup and comments. * Many improvements. Lots of details everywhere. * Cleanup old code and tests. * Handle lone timestamp tokens (just recover when something bad happens). * Adding return_language example. * No ffmpeg. * Hmm. * Some corrections. * Both fast and slow. * New black. * Update src/transformers/models/whisper/tokenization_whisper.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/whisper/tokenization_whisper.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove print. * Undoing tests modifications. * Smaller test modifications. * Rename. * Remove maxDiff. --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix * add to tests * style and quality * add missing --------- Co-authored-by: NielsRogge <NielsRogge@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

skip for now Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Make ZeroShotImageClassificationPipeline faster The pipeline makes separate calls to model for each candidate label. This commit combines all labels into one call. Original code takes more that 60 seconds to process one image and 1000 candidate labels. Updated code takes less than 2 seconds. * implement batching * code formatting * Creating an even faster zero-shot-image-classifiction. Unfortunately super tailored towards CLIP. Co-Authored-By: Yessen Kanapin <yessen@deepinfra.com> * Quality. * Cleanup. * Order different on the CI it seems. * Cleanup. * Quality. --------- Co-authored-by: Yessen Kanapin <yessen@deepinfra.com>

* intial test of inputs * added test for generation * remove asserts * fixed test * Update tests/models/time_series_transformer/test_modeling_time_series_transformer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* rework is_pipeline_test * bring back 3 tests --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* faster forward following what is done for images * add missing licence

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

upgrade to large VM Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix auto 2 * fix auto 2 * fix task guide issue * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* feat: filter try/except * Update src/transformers/dynamic_module_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update utils/check_repo.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update utils/check_repo.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix wrong documentation about DataCollator padding defaults * Fix styling

* add doc and readme * add model docs * update toctree and fix copies * update * update doc file * fix * add FLAN-UL2 to configuration mapping * fixup * Apply suggestions from code review * more clarification --------- Co-authored-by: younesbelakda <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* fix pipeline * fix feature_extraction clap * you can now batch the `is_longer` attribute * add tests * fixup * add expected scores * comment on is_longert

) Fix feature normalization in WhisperFeatureExtractor

* Fixed gradient_checkpointing/use_cache bug in blenderbot * Update modeling_blenderbot.py * Added back if statement * Formatted using black

update values Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix `get_proposal_pos_embed` * fix order * style * zero shot simplify test * add approximate values for zero shot audio classification

Disable DDp for neuron Co-authored-by: EC2 Default User <ec2-user@ip-172-31-42-72.us-west-2.compute.internal>

Co-authored-by: saswatmeher <saswatmeher@cse.iitb.ac.in>

…1956) Step 1 - Change use_cache fix

Four parameters in `LayoutLM` config were missing definitions, Added their definition (copied from BertConfig).

Use larger atol Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Initial commit * stash commit * Add model checkpointing and pushing * Fix model name inference * Update README * Update README * Remove a couple of Torch references * Update copyright date * make fixup * Update PushToHubCallback args! * Remove the torch summary * Add strategy.scope

update expected values for xglm Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Make Format

* docs: improve clarity for clm/mlm * docs: remove incorrect explanation * docs: remove incorrect explanation --------- Co-authored-by: pdhall99 <pdhall99>

* update expected values for jukebox * update expected values for jukebox * update expected values for jukebox * update expected values for jukebox * update expected values for jukebox --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add check before int casting for PIL conversion * Line length * Tidier logic

…kens (#21959) * Fix MinNewTokensLengthLogitsProcessor when used with a list of eos tokens * fix docs * Empty commit * formatting

* Fix integration test * Add test * Add test

Remove cast to Bool

* better check * better check --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

) skip test_multi_gpu_data_parallel_forward for some model tests Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [Whisper] Add model for audio classification * make fix-copies * add to docs * add docstring * empty returns * add code example * switch to fleurs * stick everything on one line

* Stop requiring Torch for our TF examples! * Slight tweak to logging in the example itself

* add create pr arg * style * add test * ficup * update test * last nit fix typo * add `is_pt_tf_cross_test` marker for the tsts

* First draft * Fix to_dict * Improve conversion script * Update config * Remove timm dependency * Fix dummies * Fix typo, add integration test * Upload 101 model as well * Remove timm dummies * Fix style --------- Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* added informer to gitignore * added informer to gitignore * WIP informer2020 * added checking that instantiate works * added config using gluonTS by kashif * WIP config * adding informeConfig. need to remove FeatureEmbedder * done InformerConfig, but need to change the names * Done informer model init. working on enc-dec * added things to address, after reading again enc-dec in the paper * done modeling - checking initialization work * added informer to gitignore * WIP informer2020 * added checking that instantiate works * added config using gluonTS by kashif * WIP config * adding informeConfig. need to remove FeatureEmbedder * done InformerConfig, but need to change the names * Done informer model init. working on enc-dec * added things to address, after reading again enc-dec in the paper * done modeling - checking initialization work * moved enc-dec init to InformerEncoder/Decoder init * added 'init_std' to config, now model init works! * WIP conversion script, and added code sources * WIP conversion script: loading original informer pth works * WIP conversion script: change defaults in the config * WIP conversion script: supporting Informer input embedding * WIP conversion script: added parameters for the informer embed * WIP conversion script: change dim_feedforward=2048 * WIP conversion script: remove unused args for loading checkpoint * just cleaning up * DataEmbedding removed, after thinking with Kashif * working on forward pass * WIP forward pass: trying to establish working batch for forward pass * cleaning and finalizing * adding HF names and docs * init after cleaning works * WIP in tests * added docs for the informer specific args * fix style * undo change * cleaning informer, now need to work only enc-dec * initial enc-dec classes * added encoder and decoder * added todo * add todos for conv_layers * added decoder docs from vanilla * added encoder docs from vanilla * remove encoder decoder from the original informer * removed AttentionLayer from the original paper * removed TriangularCausalMask, same as decoder_attention_mask * initial sparse attention * use conv_layers * fixed test_config test * fix parenthesis when itearting zip(layers, conv_layers) * error found in prob attention, added sizes as comments * fix sizes * added proposal for q_reduce indexing, and remove unused * WIP ProbMask, and changed factor=2 for testing * remove unused libs for this PR for creating the env * fix checking the attn_weights.size() after bmm * Q_reduce: changed from torch.gather to simple slicing * WIP calculate final attn_output * finish adding v_aggregated, attn_output ready * changed tgt_len to u in attention_mask, need to fix the size error * comment attention_mask for encoder, and fix if cond for v_agg * added ProbMask support (wip), removed old original code * finished ProbMask 😃 * Revert "remove unused libs for this PR for creating the env" This reverts commit 11a081e. * fixes * make style * fix initial tests * fix more tests * dry * make style * remove unused files * style * added integration tests * fix num_static_real_features * fix header * remove unused function * fix example * fix docs * Update src/transformers/models/informer/configuration_informer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/informer/modeling_informer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/informer/configuration_informer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/informer/configuration_informer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/informer/configuration_informer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/informer/configuration_informer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * fixes for reviewer * use prediction_length from model * fix style * fixed informer.mdx * added to index * updated readme * undo * make fix-copies * typo * fix copy * added Informer to toctree * in order * fixed comments * remove unneeded new lines in docs * make static real and cat optional * fix use of distil conv layers * fixed integration test * added checkpoint for convlayer * make fix-copies * updated from time series model * make fix-copies * copy decoder * fix unit tests * updated scaling config * fix integration tests * IGNORE_NON_TESTED * IGNORE_NON_AUTO_CONFIGURED * IGNORE_NON_AUTO_CONFIGURED * updated check configs * fix formatting * undo change from time series * prediction_length should not be None * aliign with the blog: prettify ProbSparse and change attention_factor to sampling_factor * make style * make fix-copies * niels CR: update contributed by * niels CR: update configuration_informer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * niels CR: update kashif -> huggingface Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * niels CR: `sampling_factor` only relevant when `attention_type`=prob * make style * fixed U_part: added multiplication by `L_Q` * fixed bug: remove `is not None` from `if config.distil` * fixed test: `decoder_seq_length` to `encoder_seq_length` in cross_attentions check * fix integration tests * updated model hub * do not shift as in training * undo * fix make-copies * make fix-copies * added `if prediction_length is None` * changed `ProbSparseAttention` to `InformerProbSparseAttention` * changed `V_sum` -> `v_mean_dim_time` * changed `ConvLayer` to `InformerConvLayer` and fixed `super()` * TimeSeriesTansformer->Informer in decoder's Copied from * more descriptive in ProbSparse * make style * fix coped from * Revert "added `if prediction_length is None`" This reverts commit b4cbddf. * fixed indent * use InformerSinusoidalPositionalEmbedding * make fix-style * fix from #21860 * fix name * make fix-copies * use time series utils * fix dec num_heads * docstring * added time series util doc * _import_structure * formatting * changes from review * make style * fix docs * fix doc * removed NegativeLogLikelihood --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update 1 * Update 2 * Update 3 * Update 4 * Update 5 * Update 6 * Update 7 * Update 8 * Update 9 * Update 10 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* add 1 to cur_len to make up the new beam length cur_len is 1 token shorter comparing to the length of the sequence whose best_sum_logprobs is the numerator. * cur_len+=1 before check if beam hyp is done * format code * reformat with black --------- Co-authored-by: Chiming <chiming@biomap.com>

Use valid dummy pixel values

… 2.0.0 (#22023) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix error message * make style

* Add BridgeTower for ITC * Fix review feedback * Rename BridgeTowerForITC, cleanup * Fix style and quality * implement tests --------- Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com> Co-authored-by: Tiep Le <tiep.le@intel.com>

…line (#22031) add tokenize_kwargs doc in the FeatureExtractionPipeline

…on_seq2seq.py (#21942) * Add specaugment to run_speech_recognition_seq2seq.py * Remove useless argument: text_column * Fix quality * Update return_attention_mask condition * Update specaugment arguments only for whisper models * Remove SpecAugment arguments from ModelArguments, only leave default values for simplicity * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update apply_spec_augment only for whisper models * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Rename return_attention_mask to forward_attention_mask to avoid confusion with wav2vec2 models --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fixing * Update modeling_whisper.py * Update modeling_whisper.py * Update src/transformers/models/whisper/modeling_whisper.py --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

…P-like models (#22035) * Avoid text_config_dict and vision_config_dict being saved * for other CLIP-like models --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* slow me --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…22034) fix slow tokenizers with passing offset_mapping

* Fix typos and add code examples, resources

* [21737][T5]: Fix gradient checkpoint bug * [21737][T5]: Fix gradient checkpoint bug * [21737][T5]: Fix gradient checkpoint bug * Update src/transformers/models/mt5/modeling_mt5.py * Update src/transformers/models/t5/modeling_t5.py --------- Co-authored-by: njindal <njindal@adobe.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

…x it (#22045) In ZSH, not using ' ' around pip install fails Running ``` pip install transformers[torch] ``` in the default ZSH terminal will fail with the error `zsh: no matches found: transformers[torch]` The solution is to wrap the installation path in ' ' like ``` pip install 'transformers[torch]' ``` Relevant StackOverflow: https://stackoverflow.com/questions/30539798/zsh-no-matches-found-requestssecurity

* Remove set_access_token usage + fail tests if FutureWarning * do not fail on FutureWarning in CI --------- Co-authored-by: testbot <lucainp@hf.co>

* show hfh warnings --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* return analysis for hyperparameter_search with ray backend * Revert "return analysis for hyperparameter_search with ray backend" This reverts commit cd51790. * add run_summary attribute to BestRun and return analysis for ray backend * fix typo * add doc for run_summary for ray backend

* Add an argument to pt-to-tf to allow overriding the model class * make fixup * Minor fix to error message * Remove unused extra conversion from the script

rm $ symbol from code block Removed the $ symbol from the code block to make copy-pasting easier.

* [deepspeed] offload + non-cpuadam optimizer exception * flip * revert min version

* Edit the docstring of `image_processing_donut` to match code * improve style * more style improvement after installing quality

* skip 3 tests --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add setters by type of args to TrainingArguments * Define more setters

Update the script Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…22007) Co-authored-by: EC2 Default User <ec2-user@ip-172-31-42-72.us-west-2.compute.internal>

* Add a progress bar for the total download of shards * Check for no cache at all * Fix check

* Fix gradient checkpointing bug in Speech2Text * Update modeling_speech_to_text.py * Update modeling_speech_to_text_2.py

* Make sure position ids are masked * test that padded input produce the same results * fix failing tests * fixup * fix batch test

* Update flan-ul2.mdx * Update flan-ul2.mdx

fix broken links

* Fix gradient checkpointing bug in Speecht5 * Update modeling_speech_to_text.py * Update src/transformers/models/speech_to_text/modeling_speech_to_text.py * Fix change errors --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

fix hint

* re: #21989 * update re: #21989 * removed cpu option * make style

* Fix imports of TF MobileViT * Fix copies

Revert "[GPT2] Propose fix for #21080 (#21853)" to avoid CI failure This reverts commit a3fef89.

* [Whisper] Remove embed_tokens from encoder docstring * new line to retrigger CI * remove new line

Adds AutoModelForZeroShotImageClassification to transformers

* add new model of MGP-STR * fix the check failings * remove torch and numpy from mgp_tokenization * remove unused import from modeling_mgp_str * add test_processing_mgp_str * rm test_processing_mgp_str.py * add test_processing_mgp_str * add test_processing_mgp_str * add test_processing_mgp_str * rm test_processing_mgp_str and add softmax outs to model * rm test_processing_mgp_str and add softmax outs to model * rewrite the code of mgp-str according to PR suggestions * rewrite the code of mgp-str according to PR suggestions * add new model of MGP-STR * fix the check failings * remove torch and numpy from mgp_tokenization * remove unused import from modeling_mgp_str * add test_processing_mgp_str * rm test_processing_mgp_str.py * add test_processing_mgp_str * add test_processing_mgp_str * add test_processing_mgp_str * rm test_processing_mgp_str and add softmax outs to model * rewrite the code of mgp-str according to PR suggestions * rewrite the code of mgp-str according to PR suggestions * remove representation_size from MGPSTRConfig * reformat configuration_mgp_str.py * format test_processor_mgp_str.py * add test for tokenizer and complete model/processer test and model file * rm Unnecessary tupple in modeling_mgp_str * reduce hidden_size/layers/label_size in test_model * add integration tests and change MGPSTR to Mgpstr * add test for logit values * reformat test model file --------- Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>

* Add pr_checks.mdx Italian translation (#17459) * Updated pr_checks.mdx Italian translation (#17459)

* updated toctree * italian translation big_model.mdx * italian translation big_models

skip accelerate test

* Fix gradient checkpointing bug in trocr * Fix format * Update src/transformers/models/trocr/modeling_trocr.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* WIP * WIP * manual inference example * make style * Apply suggestions from code review Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com> --------- Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

* Adding Type Hints to TF_Pegasus model * Updated some parameters per maintainer comments

* Add script --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Update configuration_align.py updated projected_dim=640 from 512 in arguments of AlignConfig

…on` (#22133) * add `get_input_embeddings` to `WhisperForAudioClassification` * add common tests * fix another common test * Update tests/models/whisper/test_modeling_whisper.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix style --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Let generate pick its inputs * fix squad seq2seq example

* [trainer] fix bug in grad accum * comment out debug * fix one-off * rename counter

* [deepspeed docs] Activation Checkpointing * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update deepspeed.mdx --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove backend enforcment for torch.compile * Update error * Update src/transformers/training_args.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Style --------- Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* [Safetensors] Add explicit flag to from pretrained * add test * remove @ * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Updated glossary with new terms, added abbreviations for certain terms and merged autoencoding models, autoregressive models and causal language modeling into encoder and decoder models * Update docs/source/en/glossary.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/glossary.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/glossary.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/glossary.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/glossary.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/glossary.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/glossary.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/glossary.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/glossary.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/glossary.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/glossary.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/glossary.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Added link to 'Pipeline for inference' tutorial * Trigger CI * Update docs/source/en/glossary.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update docs/source/en/glossary.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Added entry for self supervised learning, added deleted entries + fixed broken links * Update docs/source/en/glossary.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* temp fix * temporary fix * update * fix tests * fixup * update based on reveiew Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * update to fix tests * update docstring --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Move `is_pipeline_test_to_skip` to specific model test classes --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add ConvNeXt V2 to transformers * TF model is separated from the PR to fix issues

update values Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* added translated files added perf_train_cpu and perf_train_cpu_many * updated toctree

* Fix big model inference for T5 models in float16 * Apply suggestions from code review Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Style * Trigger CI with latest release --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* create MaskedImageCompletionOutput * fix bugs * fix bugs

* Don't rescale if in and in range 0-255 * Raise value error if int values too large * Update tests/test_image_transforms.py * Update tests/test_image_transforms.py

* [trainer] add --optim adamw_torch_fused * change optim default * deal with non-torch * revert default change; prep; add fp16/amp assert * typo * typo

) Revert "Enforce same behavior as PyTorch 2.0 for older versions (#22136)" This reverts commit 1c801d6.

Fix: unfinished_sequences with correct device The original code was causing errors when running torch.jit.trace due to the tensor options being incorrect. I fixed this by using torch.ones to create a tensor with the correct device and dtype. This should resolve the issue with running torch.jit.trace.

Revert changes

* Fix regression in pipeline when device=-1 is passed * Add regression test

* Use return_loss for BridgeTowerForContrastiveLearning, add example * fix tests * Update example in BridgeTowerForContrastiveLearning * Update test_modeling_bridgetower.py * update model output format * minor update * Update src/transformers/models/bridgetower/modeling_bridgetower.py * make style --------- Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com> Co-authored-by: Tiep Le <tiep.le@intel.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* t5 remove data dependency * make style * make fix-copies --------- Co-authored-by: Prathik Rao <prathikrao@microsoft.com>

* Deal with torch-tensorrt --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix align docs typo

Update values Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Tranlstion Italian: migration * Update migration.mdx minor fixes * Update _toctree.yml * Delete migration.mdx * Add italian translation of migration.mdx * Update of migration.mdx translation and toctree

* LLaMA * sharding and docs * tweak * black * inits * ruff * LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP * init * no checkpoint * docs * ruff * type_vocab_size * tokenizer fixes * tokenizer fixes * Update tokenization_llama.py * Update tokenization_llama.py * Update configuration_llama.py * Update modeling_llama.py * tokenizer add_bos by default * licenses * remove decoder * norms and mlp * rope overhaul * tweaks * black * mention OPT implementation * off-by-one naming * typo * fix * tokenization fix and slicing bug * padding config * cleanup * black * update tests * undo typo * fix vocab caching logic * ruff * docbuilder * attn fix from BlackSamorez * initial feedback * typo * docs * llama case * llama case * load checkpoint docs * comment about tokenizer * tokenizer defaults * clear past_key_values if use_cache=False * last tweaks * last tweaks * last tweaks * last tweaks --------- Co-authored-by: Stella Biderman <stellabiderman@gmail.com>

* Update UNCONVERTIBLE_MODEL_ARCHITECTURES * Deal with 2 model tester classes in single test file * Deal with 2 model tester classes in single test file * Deal with 2 model tester classes in single test file * make style and quality --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143 * Reduced column width * Fix formatting. * Revert "Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143" This reverts commit 6e95a10. * Fix export error. * Revert "Fix formatting." This reverts commit 8310f60. * Propagated changes made in SwinV2 to Swin2SR

* add `accelerate` support for XGLM * fix order

* fixes a typo * .

* py38 + torch 2 * increment cache versions --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>

Use dash 2.8.1 for now Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* added doc to toc, auto tip with supported models, mention of task guide in model docs * make style * removed "see also" * minor fix

* LLaMA house-keeping * Doc links

* fix AutoTP in deepspeed could not work for bloom Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * add a method in BloomModel to build ailib Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Add LlamaForSequenceClassification * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Add docstring * Add test * Add input embedding getter and setter * Remove dead code --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

removed .mdx extension

fix(docs): task guide links in model docs

* Add kernel size to NATTEN's QK arguments. The new NATTEN 0.14.5 supports PyTorch 2.0, but also adds an additional argument to the QK operation to allow optional RPBs. This ends up failing NATTEN tests. This commit adds NATTEN back to circleci and adds the arguments to get it working again. * Force NATTEN >= 0.14.5

Revert "Use `dash==2.8.1` for now for daily CI (#22227)" This reverts commit 5321867.

…ng (#22234) push

[trainer] param count for zero3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update with forked from #1

Update with forked from #1

Commits on Feb 23, 2023

Commits on Feb 24, 2023

Commits on Feb 25, 2023

Commits on Feb 27, 2023

Commits on Feb 28, 2023

Commits on Mar 1, 2023

Commits on Mar 2, 2023

Commits on Mar 3, 2023

Commits on Mar 4, 2023

Commits on Mar 6, 2023

Commits on Mar 7, 2023

Commits on Mar 8, 2023

Commits on Mar 9, 2023

Commits on Mar 10, 2023

Commits on Mar 11, 2023

Commits on Mar 13, 2023

Commits on Mar 14, 2023

Commits on Mar 15, 2023

Commits on Mar 16, 2023

Commits on Mar 17, 2023