Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add drop_last arg for data loader #4757

Merged

Conversation

setu4993
Copy link
Contributor

@setu4993 setu4993 commented Jun 4, 2020

Add an extra argument to TrainingArguments that would be passed on to Trainer for use in DataLoader.

I ran into a problem while using the Trainer this week and the GPU expecting the full batch size of vector inputs, and put a workaround in place in the dataset class I was using, but would be useful to have this as an optional argument.

@LysandreJik
Copy link
Member

Hi! This looks like a cool feature to add, indeed. I'm curious, what was the error you obtained because of an incomplete batch? It shouldn't raise an error if a batch is smaller than what it has previously seen.

I could see it being useful when the framework needs to trace with a given input size though, like with TPUs or with JAX.

@LysandreJik LysandreJik requested a review from julien-c June 4, 2020 14:48
@setu4993
Copy link
Contributor Author

setu4993 commented Jun 4, 2020

Hey @LysandreJik, thanks for taking a look!

This error occurred on the last step of the epoch:
RuntimeError: Gather got an input of invalid size: got [2, 1, 20, 256, 64], but expected [2, 2, 20, 256, 64] (gather at /AWS-PyTorch/torch/csrc/cuda/comm.cpp:231)

Because of the nature of the error and it occurring on the last step, my suspicion was it was because of drop_last. I implemented a workaround for it and that stopped the error from re-appearing.

@codecov
Copy link

codecov bot commented Jun 4, 2020

Codecov Report

Merging #4757 into master will decrease coverage by 0.07%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4757      +/-   ##
==========================================
- Coverage   77.41%   77.34%   -0.08%     
==========================================
  Files         128      128              
  Lines       21059    21060       +1     
==========================================
- Hits        16302    16288      -14     
- Misses       4757     4772      +15     
Impacted Files Coverage Δ
src/transformers/trainer.py 39.62% <ø> (ø)
src/transformers/training_args.py 76.66% <100.00%> (+0.26%) ⬆️
src/transformers/data/processors/squad.py 28.66% <0.00%> (-6.37%) ⬇️
src/transformers/tokenization_utils.py 89.80% <0.00%> (-0.12%) ⬇️
src/transformers/modeling_tf_utils.py 88.96% <0.00%> (+0.32%) ⬆️
src/transformers/modeling_openai.py 82.03% <0.00%> (+1.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5bf9afb...9a8fb7a. Read the comment docs.

@julien-c
Copy link
Member

julien-c commented Jun 4, 2020

I'm assuming you use GPU distribution – what method do you use for distribution? nn.DataParallel?

@setu4993
Copy link
Contributor Author

setu4993 commented Jun 4, 2020

@julien-c: Correct, it is using nn.DataParallel under the hood.

@@ -133,6 +133,10 @@ class TrainingArguments:
)
tpu_metrics_debug: bool = field(default=False, metadata={"help": "TPU: Whether to print debug metrics"})

data_loader_drop_last: bool = field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
data_loader_drop_last: bool = field(
dataloader_drop_last: bool = field(

@julien-c
Copy link
Member

julien-c commented Jun 4, 2020

It should work out of the box with torch.distributed instead of nn.DataParallel.

I have no objection to merging this though :)

@setu4993
Copy link
Contributor Author

setu4993 commented Jun 4, 2020

Hmmm, this was on AWS SageMaker, so I'll double-check how it is implemented there.

Good recommendation on the change. Also, another question: Should the arg be separate for train and eval data loaders? I assumed not, but just wanted to confirm :).

@setu4993 setu4993 force-pushed the feature/data-collator-drop-last branch from 03a90f4 to 9a8fb7a Compare June 4, 2020 22:23
@julien-c
Copy link
Member

julien-c commented Jun 4, 2020

No I can't think of a scenario where one would want to drop_last in train and not in eval (or inversely)

Thank you, merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants