Fix Trainer in DataParallel setting #5685

sgugger · 2020-07-11T12:27:33Z

The new output types seem to break data parallel FYI, see comment on #5671. This is is because of the line

return type(out)(map(gather_map, zip(*outputs)))

in scatter_gather which tries to reconstruct an output of the same type as ours (and fails since it does not provide the necessary arguments). There is no way to fix our ModelOutput to work with this AFAICT.

However, we have the return_tuple argument to fix the issue :-)

codecov · 2020-07-11T12:41:47Z

Codecov Report

Merging #5685 into master will decrease coverage by 0.20%.
The diff coverage is 25.00%.

@@            Coverage Diff             @@
##           master    #5685      +/-   ##
==========================================
- Coverage   78.11%   77.91%   -0.21%     
==========================================
  Files         146      146              
  Lines       25983    25987       +4     
==========================================
- Hits        20297    20247      -50     
- Misses       5686     5740      +54

Impacted Files	Coverage Δ
src/transformers/trainer.py	`37.84% <25.00%> (-0.12%)`	⬇️
src/transformers/modeling_tf_t5.py	`44.56% <0.00%> (-46.35%)`	⬇️
src/transformers/modeling_tf_gpt2.py	`63.55% <0.00%> (-31.78%)`	⬇️
src/transformers/generation_tf_utils.py	`79.94% <0.00%> (-6.02%)`	⬇️
src/transformers/modeling_tf_utils.py	`86.92% <0.00%> (-1.97%)`	⬇️
src/transformers/modeling_openai.py	`82.31% <0.00%> (+1.28%)`	⬆️
src/transformers/modeling_tf_roberta.py	`93.36% <0.00%> (+49.37%)`	⬆️
src/transformers/modeling_tf_openai.py	`95.18% <0.00%> (+74.91%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7fad617...05ec8f6. Read the comment docs.

thomwolf

LGTM

Fix Trainer in DataParallel setting

ef9e30e

sgugger requested review from thomwolf, julien-c and LysandreJik July 11, 2020 12:27

Fix typo

05ec8f6

thomwolf mentioned this pull request Jul 13, 2020

__init__() missing 1 required positional argument: 'logits' #5693

Closed

4 tasks

thomwolf approved these changes Jul 13, 2020

View reviewed changes

sgugger merged commit ce374ba into master Jul 13, 2020

sgugger deleted the fix_dp_model_output branch July 13, 2020 12:37

stas00 mentioned this pull request Jul 14, 2020

DataParallel fixes #5733

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Trainer in DataParallel setting #5685

Fix Trainer in DataParallel setting #5685

sgugger commented Jul 11, 2020

codecov bot commented Jul 11, 2020 •

edited

Loading

thomwolf left a comment

Fix Trainer in DataParallel setting #5685

Fix Trainer in DataParallel setting #5685

Conversation

sgugger commented Jul 11, 2020

codecov bot commented Jul 11, 2020 • edited Loading

Codecov Report

thomwolf left a comment

Choose a reason for hiding this comment

codecov bot commented Jul 11, 2020 •

edited

Loading