Pegasus finetune script: add --adafactor #6811

sshleifer · 2020-08-28T21:21:00Z

No description provided.

codecov · 2020-08-28T21:28:16Z

Codecov Report

Merging #6811 into master will decrease coverage by 0.10%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #6811      +/-   ##
==========================================
- Coverage   79.58%   79.47%   -0.11%     
==========================================
  Files         157      157              
  Lines       28588    28586       -2     
==========================================
- Hits        22752    22719      -33     
- Misses       5836     5867      +31

Impacted Files	Coverage Δ
src/transformers/configuration_pegasus.py	`100.00% <100.00%> (ø)`
src/transformers/configuration_openai.py	`34.28% <0.00%> (-62.86%)`	⬇️
src/transformers/tokenization_albert.py	`28.84% <0.00%> (-58.66%)`	⬇️
src/transformers/modeling_openai.py	`23.87% <0.00%> (-57.10%)`	⬇️
src/transformers/modeling_tf_distilbert.py	`64.47% <0.00%> (-34.36%)`	⬇️
src/transformers/tokenization_marian.py	`67.79% <0.00%> (-31.36%)`	⬇️
src/transformers/tokenization_transfo_xl.py	`20.53% <0.00%> (-21.21%)`	⬇️
src/transformers/generation_utils.py	`96.66% <0.00%> (-0.28%)`	⬇️
src/transformers/tokenization_utils_base.py	`93.76% <0.00%> (+0.27%)`	⬆️
src/transformers/modeling_tf_utils.py	`87.29% <0.00%> (+0.32%)`	⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ab21b0...67322db. Read the comment docs.

* Only access loss tensor every logging_steps * tensor.item() was being called every step. This must not be done for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU communication at each step. On RoBERTa MLM for example, it reduces step time by 30%, should be larger for smaller step time models/tasks. * Train batch size was not correct in case a user uses the `per_gpu_train_batch_size` flag * Avg reduce loss accross eval shards * Fix style (#6803) * t5 model should make decoder_attention_mask (#6800) * [s2s] Test hub configs in self-scheduled CI (#6809) * [s2s] round runtime in run_eval (#6798) * Pegasus finetune script: add --adafactor (#6811) * [bart] rename self-attention -> attention (#6708) * [tests] fix typos in inputs (#6818) * Fixed open in colab link (#6825) * Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827) * BR_BERTo model card (#6793) * clearly indicate shuffle=False (#6312) * Clarify shuffle * clarify shuffle Co-authored-by: Kevin Canwen Xu <canwenxu@126.com> * [s2s README] Add more dataset download instructions (#6737) * Style * Patch logging issue * Set default logging level to `WARNING` instead of `INFO` * TF Flaubert w/ pre-norm (#6841) * Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644) * add datacollator and dataset for next sentence prediction task * bug fix (numbers of special tokens & truncate sequences) * bug fix (+ dict inputs support for data collator) * add padding for nsp data collator; renamed cached files to avoid conflict. * add test for nsp data collator * Style Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> * Fix in Adafactor docstrings (#6845) * Fix resuming training for Windows (#6847) * Only access loss tensor every logging_steps * tensor.item() was being called every step. This must not be done for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU communication at each step. On RoBERTa MLM for example, it reduces step time by 30%, should be larger for smaller step time models/tasks. * Train batch size was not correct in case a user uses the `per_gpu_train_batch_size` flag * Avg reduce loss accross eval shards * comments Co-authored-by: Sam Shleifer <sshleifer@gmail.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com> Co-authored-by: Zane Lim <zyuanlim@gmail.com> Co-authored-by: Rodolfo De Nadai <rdenadai@gmail.com> Co-authored-by: xujiaze13 <37360975+xujiaze13@users.noreply.github.com> Co-authored-by: Kevin Canwen Xu <canwenxu@126.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Huang Lianzhe <hlz@pku.edu.cn> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Only access loss tensor every logging_steps * tensor.item() was being called every step. This must not be done for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU communication at each step. On RoBERTa MLM for example, it reduces step time by 30%, should be larger for smaller step time models/tasks. * Train batch size was not correct in case a user uses the `per_gpu_train_batch_size` flag * Avg reduce loss accross eval shards * Fix style (huggingface#6803) * t5 model should make decoder_attention_mask (huggingface#6800) * [s2s] Test hub configs in self-scheduled CI (huggingface#6809) * [s2s] round runtime in run_eval (huggingface#6798) * Pegasus finetune script: add --adafactor (huggingface#6811) * [bart] rename self-attention -> attention (huggingface#6708) * [tests] fix typos in inputs (huggingface#6818) * Fixed open in colab link (huggingface#6825) * Add model card for singbert lite. Update widget for singbert and singbert-large. (huggingface#6827) * BR_BERTo model card (huggingface#6793) * clearly indicate shuffle=False (huggingface#6312) * Clarify shuffle * clarify shuffle Co-authored-by: Kevin Canwen Xu <canwenxu@126.com> * [s2s README] Add more dataset download instructions (huggingface#6737) * Style * Patch logging issue * Set default logging level to `WARNING` instead of `INFO` * TF Flaubert w/ pre-norm (huggingface#6841) * Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (huggingface#6644) * add datacollator and dataset for next sentence prediction task * bug fix (numbers of special tokens & truncate sequences) * bug fix (+ dict inputs support for data collator) * add padding for nsp data collator; renamed cached files to avoid conflict. * add test for nsp data collator * Style Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> * Fix in Adafactor docstrings (huggingface#6845) * Fix resuming training for Windows (huggingface#6847) * Only access loss tensor every logging_steps * tensor.item() was being called every step. This must not be done for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU communication at each step. On RoBERTa MLM for example, it reduces step time by 30%, should be larger for smaller step time models/tasks. * Train batch size was not correct in case a user uses the `per_gpu_train_batch_size` flag * Avg reduce loss accross eval shards * comments Co-authored-by: Sam Shleifer <sshleifer@gmail.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com> Co-authored-by: Zane Lim <zyuanlim@gmail.com> Co-authored-by: Rodolfo De Nadai <rdenadai@gmail.com> Co-authored-by: xujiaze13 <37360975+xujiaze13@users.noreply.github.com> Co-authored-by: Kevin Canwen Xu <canwenxu@126.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Huang Lianzhe <hlz@pku.edu.cn> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

This reverts commit 7aa7b6b.

add --adafactor

a88d4a0

sshleifer added 2 commits August 29, 2020 12:09

Gigaword failing

fe32918

boom boom

67322db

sshleifer merged commit 0f58903 into huggingface:master Aug 29, 2020

sshleifer deleted the pegasus-bash branch August 29, 2020 21:43

stas00 pushed a commit to stas00/transformers that referenced this pull request Aug 30, 2020

Pegasus finetune script: add --adafactor (huggingface#6811)

1f5abb0

Zigur pushed a commit to Zigur/transformers that referenced this pull request Oct 26, 2020

Pegasus finetune script: add --adafactor (huggingface#6811)

af5e885

fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020

Pegasus finetune script: add --adafactor (huggingface#6811)

7aa7b6b

fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020

Revert "Pegasus finetune script: add --adafactor (huggingface#6811)"

cbc2fd2

This reverts commit 7aa7b6b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pegasus finetune script: add --adafactor #6811

Pegasus finetune script: add --adafactor #6811

sshleifer commented Aug 28, 2020 •

edited

Loading

codecov bot commented Aug 28, 2020 •

edited

Loading

Pegasus finetune script: add --adafactor #6811

Pegasus finetune script: add --adafactor #6811

Conversation

sshleifer commented Aug 28, 2020 • edited Loading

codecov bot commented Aug 28, 2020 • edited Loading

Codecov Report

sshleifer commented Aug 28, 2020 •

edited

Loading

codecov bot commented Aug 28, 2020 •

edited

Loading