Skip to content

Add drop_last arg for data loader#4757

Merged
julien-c merged 1 commit into
huggingface:masterfrom
setu4993:feature/data-collator-drop-last
Jun 4, 2020
Merged

Add drop_last arg for data loader#4757
julien-c merged 1 commit into
huggingface:masterfrom
setu4993:feature/data-collator-drop-last

Conversation

@setu4993

@setu4993 setu4993 commented Jun 4, 2020

Copy link
Copy Markdown
Contributor

Add an extra argument to TrainingArguments that would be passed on to Trainer for use in DataLoader.

I ran into a problem while using the Trainer this week and the GPU expecting the full batch size of vector inputs, and put a workaround in place in the dataset class I was using, but would be useful to have this as an optional argument.

@LysandreJik

Copy link
Copy Markdown
Member

Hi! This looks like a cool feature to add, indeed. I'm curious, what was the error you obtained because of an incomplete batch? It shouldn't raise an error if a batch is smaller than what it has previously seen.

I could see it being useful when the framework needs to trace with a given input size though, like with TPUs or with JAX.

@LysandreJik LysandreJik requested a review from julien-c June 4, 2020 14:48
@setu4993

setu4993 commented Jun 4, 2020

Copy link
Copy Markdown
Contributor Author

Hey @LysandreJik, thanks for taking a look!

This error occurred on the last step of the epoch:
RuntimeError: Gather got an input of invalid size: got [2, 1, 20, 256, 64], but expected [2, 2, 20, 256, 64] (gather at /AWS-PyTorch/torch/csrc/cuda/comm.cpp:231)

Because of the nature of the error and it occurring on the last step, my suspicion was it was because of drop_last. I implemented a workaround for it and that stopped the error from re-appearing.

@codecov

codecov Bot commented Jun 4, 2020

Copy link
Copy Markdown

Codecov Report

Merging #4757 into master will decrease coverage by 0.07%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4757      +/-   ##
==========================================
- Coverage   77.41%   77.34%   -0.08%     
==========================================
  Files         128      128              
  Lines       21059    21060       +1     
==========================================
- Hits        16302    16288      -14     
- Misses       4757     4772      +15     
Impacted Files Coverage Δ
src/transformers/trainer.py 39.62% <ø> (ø)
src/transformers/training_args.py 76.66% <100.00%> (+0.26%) ⬆️
src/transformers/data/processors/squad.py 28.66% <0.00%> (-6.37%) ⬇️
src/transformers/tokenization_utils.py 89.80% <0.00%> (-0.12%) ⬇️
src/transformers/modeling_tf_utils.py 88.96% <0.00%> (+0.32%) ⬆️
src/transformers/modeling_openai.py 82.03% <0.00%> (+1.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5bf9afb...9a8fb7a. Read the comment docs.

@julien-c

julien-c commented Jun 4, 2020

Copy link
Copy Markdown
Member

I'm assuming you use GPU distribution – what method do you use for distribution? nn.DataParallel?

@setu4993

setu4993 commented Jun 4, 2020

Copy link
Copy Markdown
Contributor Author

@julien-c: Correct, it is using nn.DataParallel under the hood.

Comment thread src/transformers/training_args.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
data_loader_drop_last: bool = field(
dataloader_drop_last: bool = field(

@julien-c

julien-c commented Jun 4, 2020

Copy link
Copy Markdown
Member

It should work out of the box with torch.distributed instead of nn.DataParallel.

I have no objection to merging this though :)

@setu4993

setu4993 commented Jun 4, 2020

Copy link
Copy Markdown
Contributor Author

Hmmm, this was on AWS SageMaker, so I'll double-check how it is implemented there.

Good recommendation on the change. Also, another question: Should the arg be separate for train and eval data loaders? I assumed not, but just wanted to confirm :).

@setu4993 setu4993 force-pushed the feature/data-collator-drop-last branch from 03a90f4 to 9a8fb7a Compare June 4, 2020 22:23
@julien-c

julien-c commented Jun 4, 2020

Copy link
Copy Markdown
Member

No I can't think of a scenario where one would want to drop_last in train and not in eval (or inversely)

Thank you, merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants