Update with forked from #1

oushu1zhangxiangxuan1 · 2023-03-20T09:29:44Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

* Update expect output values - as Hub repo. files are updated * Update expect output values - as librosa is from 0.9.2 to 0.10.0 on CI docker * fix * update one more --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix bug * forward contrib credits from discussions * change logic --------- Co-authored-by: edbeeching <edbeeching@users.noreply.github.com>

* Ran Black formatting * Added imports and reformatted * Update src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* troubleshooting guide: added an error description for missing auto-mapping * minor polishing * changed the example * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/troubleshooting.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [deepspeed tests] fix issues introduced by #21700 * fix * fix

* Removed useless check for backend * fix style check for graphormer * Reverted change and corrected requires_backend for cython * code qual

#21612) * fix: Change is_last chunk calc and add conditional break * format fix * account for 0 and full stride_rights, add comment * add new test * make style * update slow whisper asr test timestamps * use nested_simplify on output and round timestamp to hundreths place

* [flax] adding support for batch norm layers * fixing bugs related to pt+flax integration * cleanup, batchnorm support in sharded pt to flax * support for batchnorm tests in pt+flax integration * simplifying checking batch norm layer

…1756) * [Examples] Generalise run audio classification for log-mel models * batch feature extractor * make style

* Different behavior in DistilBERT when using "inputs_embeds" Fixes #21089 * fix failing test

* Return and rescale attention_mask * Add SpecAugment to Whisper modeling * Fix test * Update docstring * Add SpecAug related parameters to model config * Add the _mask_input_features function to doc * Fix quality * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove dev comments * Add test * Resolve conflict * feat: mask {feature, time} prob fast tests * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: sanchit-gandhi <sanchit@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix history * input_features instead of input ids for TFWhisport doctest * use translate intead of transcribe

…g and gradient checkpointing (#21759)

* updated expected * prediction_length fix * prediction_length default value * default prediction_length 24 * revert back prediction_length default * move prediction_length test

* fix gradient checkpointing bug * fix gradient checkpointing bug * ran make fix-copies * fixed bug * fixed bug

* Fix resume_from_checkpoint for deepspeed Fix resume_from_checkpoint for deepspeed, by ensuring that the deepspeed engine is the one to load the checkpoint. * Empty commit to trigger CI * Removed deepspeed skipping Removed deepspeed skipping inside the _load_from_checkpoint function, as it is obsolete * another adjustment * Trigger CI * trigger circleci * style --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Stas Bekman <stas@stason.org>

* Override the decoding parameters of Seq2SeqTrainer * Fix quality * Fix max_length parameter * Fix quality * Remove redundant parameter max_length * Separate the preprocess of train and validation to use different max_target_length

Fix docstring gpt2 config

* fix wrong url * typos in english documentation

make concrete_args from outside available

* add pipeline * update init * add zero shot to init * update inits and correct checkpoints * update base to support input features * add tests * Update src/transformers/pipelines/zero_shot_audio_classification.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/pipelines/zero_shot_audio_classification.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * update pieline code * use tiny checkpoint * nits and expected value with tiny model * style * last nit on tests values * fix styling * fix collate fn that was casting t float * update --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* uint8 -> bool * fix copies * style * update test modeling commen when checking attention buffers * style * use logical not on random mask instead of subtraction with 1 * remove torch uint8 * quality * remove modified modeling utils * Update based on review Co-authored-by: sgugger <sylvain.gugger@gmail.com> --------- Co-authored-by: sgugger <sylvain.gugger@gmail.com>

* add `accelerate` marker * add to docs * Update docs/source/en/testing.mdx

…21787) * fix perceiver fp16 * hopefully fix tests

fix nn.init.trunc_normal_ call on half data

* Fix gradient checkpointing bug in gptneox * Remove use_cache block

Revert changes

* Fix regression in pipeline when device=-1 is passed * Add regression test

* Use return_loss for BridgeTowerForContrastiveLearning, add example * fix tests * Update example in BridgeTowerForContrastiveLearning * Update test_modeling_bridgetower.py * update model output format * minor update * Update src/transformers/models/bridgetower/modeling_bridgetower.py * make style --------- Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com> Co-authored-by: Tiep Le <tiep.le@intel.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* t5 remove data dependency * make style * make fix-copies --------- Co-authored-by: Prathik Rao <prathikrao@microsoft.com>

* Deal with torch-tensorrt --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix align docs typo

Update values Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Tranlstion Italian: migration * Update migration.mdx minor fixes * Update _toctree.yml * Delete migration.mdx * Add italian translation of migration.mdx * Update of migration.mdx translation and toctree

* LLaMA * sharding and docs * tweak * black * inits * ruff * LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP * init * no checkpoint * docs * ruff * type_vocab_size * tokenizer fixes * tokenizer fixes * Update tokenization_llama.py * Update tokenization_llama.py * Update configuration_llama.py * Update modeling_llama.py * tokenizer add_bos by default * licenses * remove decoder * norms and mlp * rope overhaul * tweaks * black * mention OPT implementation * off-by-one naming * typo * fix * tokenization fix and slicing bug * padding config * cleanup * black * update tests * undo typo * fix vocab caching logic * ruff * docbuilder * attn fix from BlackSamorez * initial feedback * typo * docs * llama case * llama case * load checkpoint docs * comment about tokenizer * tokenizer defaults * clear past_key_values if use_cache=False * last tweaks * last tweaks * last tweaks * last tweaks --------- Co-authored-by: Stella Biderman <stellabiderman@gmail.com>

* Update UNCONVERTIBLE_MODEL_ARCHITECTURES * Deal with 2 model tester classes in single test file * Deal with 2 model tester classes in single test file * Deal with 2 model tester classes in single test file * make style and quality --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143 * Reduced column width * Fix formatting. * Revert "Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143" This reverts commit 6e95a10. * Fix export error. * Revert "Fix formatting." This reverts commit 8310f60. * Propagated changes made in SwinV2 to Swin2SR

* add `accelerate` support for XGLM * fix order

* fixes a typo * .

* py38 + torch 2 * increment cache versions --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>

Use dash 2.8.1 for now Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* added doc to toc, auto tip with supported models, mention of task guide in model docs * make style * removed "see also" * minor fix

* LLaMA house-keeping * Doc links

* fix AutoTP in deepspeed could not work for bloom Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * add a method in BloomModel to build ailib Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Add LlamaForSequenceClassification * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Add docstring * Add test * Add input embedding getter and setter * Remove dead code --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

removed .mdx extension

fix(docs): task guide links in model docs

* Add kernel size to NATTEN's QK arguments. The new NATTEN 0.14.5 supports PyTorch 2.0, but also adds an additional argument to the QK operation to allow optional RPBs. This ends up failing NATTEN tests. This commit adds NATTEN back to circleci and adds the arguments to get it working again. * Force NATTEN >= 0.14.5

Revert "Use `dash==2.8.1` for now for daily CI (#22227)" This reverts commit 5321867.

…ng (#22234) push

[trainer] param count for zero3

ydshieh and others added 30 commits February 23, 2023 09:41

Fix 2 quicktour file doctest (#21742)

36a6a1a

* Update expect output values - as Hub repo. files are updated * Update expect output values - as librosa is from 0.9.2 to 0.10.0 on CI docker * fix * update one more --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

[GPTNeo] Fix gradient checkpointing bug (#21733)

78a93d1

* fix bug * forward contrib credits from discussions * change logic --------- Co-authored-by: edbeeching <edbeeching@users.noreply.github.com>

Generate: Fix GIT batched captioning (#21738)

1d4b797

Skip test_log_level for now

aa3787c

Added Type Hints for modeling_tf_encoder_decoder.py (#21673)

0ffa22f

* Ran Black formatting * Added imports and reformatted * Update src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

[deepspeed tests] fix issues introduced by #21700 (#21769)

6330626

* [deepspeed tests] fix issues introduced by #21700 * fix * fix

Graphormer fix (#21699)

4446b6b

* Removed useless check for backend * fix style check for graphormer * Reverted change and corrected requires_backend for cython * code qual

[Examples] Generalise run audio classification for log-mel models (#2…

1348924

…1756) * [Examples] Generalise run audio classification for log-mel models * batch feature extractor * make style

Different behavior in DistilBERT when using "inputs_embeds" (#21752)

14f3320

* Different behavior in DistilBERT when using "inputs_embeds" Fixes #21089 * fix failing test

[Flax] Fix erroneous kwargs being passed to generate config (#21765)

75bd49f

Fix-ci-whisper (#21767)

087436c

* fix history * input_features instead of input ids for TFWhisport doctest * use translate intead of transcribe

Generate - update cookie cutters to not initialize cache with trainin…

440f397

…g and gradient checkpointing (#21759)

[time series] updated expected values for integration test. (#21762)

ba0e370

* updated expected * prediction_length fix * prediction_length default value * default prediction_length 24 * revert back prediction_length default * move prediction_length test

[GPT2, ProphetNet] Fix gradient checkpointing bug (#21772)

59c1d5b

* fix gradient checkpointing bug * fix gradient checkpointing bug * ran make fix-copies * fixed bug * fixed bug

[SpeechT5] Fix HiFiGAN tests (#21788)

3dae0d7

Fix type in gpt2 config docstring (#21782)

a369836

Fix docstring gpt2 config

Fix en documentation typos (#21799)

ba2a5f1

* fix wrong url * typos in english documentation

[FX tracer] Make concrete_args from outside available (#21775)

2ea1ef9

make concrete_args from outside available

[tests] add accelerate marker (#21743)

831f314

* add `accelerate` marker * add to docs * Update docs/source/en/testing.mdx

Fix PyTorch Perceiver PerceiverFourierPositionEncoding with fp16 (#…

ebf84f0

…21787) * fix perceiver fp16 * hopefully fix tests

Fix nn.init.trunc_normal_ call on torch.float16 data (#21789)

0c7f93f

fix nn.init.trunc_normal_ call on half data

Fix gradient checkpointing bug in gptneox (#21815)

7811bf7

* Fix gradient checkpointing bug in gptneox * Remove use_cache block

amyeroberts and others added 29 commits March 15, 2023 18:37

Revert 22152 MaskedImageCompletionOutput changes (#22187)

7376814

Revert changes

Regression pipeline device (#22190)

42ad693

* Fix regression in pipeline when device=-1 is passed * Add regression test

t5 remove data dependency (#22097)

7c4999e

* t5 remove data dependency * make style * make fix-copies --------- Co-authored-by: Prathik Rao <prathikrao@microsoft.com>

Fix DeepSpeed CI (#22194)

1c4a9ac

* Deal with torch-tensorrt --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix typo in Align docs (#22199)

1485bd9

Fix align docs typo

Update expected values in MgpstrModelIntegrationTest (#22195)

52a57f7

Update values Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Italian Translation of migration.mdx (#22183)

09922da

* Tranlstion Italian: migration * Update migration.mdx minor fixes * Update _toctree.yml * Delete migration.mdx * Add italian translation of migration.mdx * Update of migration.mdx translation and toctree

[XGLM] Add accelerate support for XGLM (#22207)

da3ba3a

* add `accelerate` support for XGLM * fix order

fixes a typo in WhisperFeatureExtractor docs. (#22208)

fb366b9

* fixes a typo * .

🔥py38 + torch 2 🔥🔥🔥🚀 (#22204)

5110e57

* py38 + torch 2 * increment cache versions --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Hotfix for natten issue with torch 2.0.0 on CircleCI (#22218)

97a3d16

fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fix typos in llama.mdx (#22223)

33d033d

fix code example in mgp-str doc (#22219)

af1c864

Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>

Use dash==2.8.1 for now for daily CI (#22227)

5321867

Use dash 2.8.1 for now Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Depth estimation task guide (#22205)

42f8f76

* added doc to toc, auto tip with supported models, mention of task guide in model docs * make style * removed "see also" * minor fix

LLaMA house-keeping (#22216)

0093402

* LLaMA house-keeping * Doc links

Removed .mdx extension in two links (#22230)

314cdf7

removed .mdx extension

fix(docs): fix task guide links in model docs (#22226)

074490b

fix(docs): task guide links in model docs

Revert "Use dash==2.8.1 for now for daily CI" (#22233)

bec0756

Revert "Use `dash==2.8.1` for now for daily CI (#22227)" This reverts commit 5321867.

Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbeddi…

cf601b9

…ng (#22234) push

[trainer] param count for deepspeed zero3 (#22193)

60d51ef

[trainer] param count for zero3

oushu1zhangxiangxuan1 merged commit 22ead54 into oushu1zhangxiangxuan1:main Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update with forked from #1

Update with forked from #1

oushu1zhangxiangxuan1 commented Mar 20, 2023

Update with forked from #1

Update with forked from #1

Conversation

oushu1zhangxiangxuan1 commented Mar 20, 2023

What does this PR do?

Before submitting

Who can review?