Skip to content

Error out of default_collate for lists of unequal size #38492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

choidongyeon
Copy link

@choidongyeon choidongyeon commented May 14, 2020

Fix issue #23141

In the below example default_collate collates each element of the list. Since the second element isn't present in all samples, it is discarded:

from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import numpy as np

class CustomDataset(Dataset):
    def __len__(self):
        return 2

    def __getitem__(self, idx):
        tmp = {
            "foo": np.array([1, 2, 3]),
            "bar": ["X"] * (idx+1),
        }

        return tmp

training = CustomDataset()

for batch in DataLoader(training, batch_size=2):
    print(batch)

Yields

{
  'foo': tensor(
    [
      [1, 2, 3],
      [1, 2, 3]
    ]
  ),
  'bar': [
      ('X', 'X'),
    ]
}

Based on discussion in the issue, it seems the best course of action is to error out in this case. This seems consistent with what is done for tensor elements, as seen in TensorShape.cpp line 1066 which is called when torch.stack is called. In this PR, I introduce a similar message to error out for lists.

@ssnl

@choidongyeon choidongyeon requested a review from apaszke as a code owner May 14, 2020 17:45
@choidongyeon choidongyeon changed the title Add error message for default_collate list transpose Error out of default_collate for lists of unequal size May 14, 2020
@dr-ci
Copy link

dr-ci bot commented May 14, 2020

💊 CI failures summary and remediations

As of commit d30f02b (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (1/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/ecr_gc_docker/gc.py 
Auto-merging .circleci/ecr_gc_docker/gc.py 
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/codegen_validation/compare_normalized_yaml.sh 
Auto-merging .circleci/codegen_validation/compare_normalized_yaml.sh 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/lib/miniyaml.py 
Auto-merging .circleci/cimodel/lib/miniyaml.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py 
Auto-merging .circleci/cimodel/data/windows_build_definitions.py 
Automatic merge failed; fix conflicts and then commit the result. 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (2/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/ecr_gc_docker/gc.py 
Auto-merging .circleci/ecr_gc_docker/gc.py 
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/codegen_validation/compare_normalized_yaml.sh 
Auto-merging .circleci/codegen_validation/compare_normalized_yaml.sh 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/lib/miniyaml.py 
Auto-merging .circleci/cimodel/lib/miniyaml.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py 
Auto-merging .circleci/cimodel/data/windows_build_definitions.py 
Automatic merge failed; fix conflicts and then commit the result. 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 6 times.

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a test in test_dataloader.py too? something that asserts default_collate raises is fine!

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you :)

@ssnl
Copy link
Collaborator

ssnl commented May 15, 2020

@pytorchbot merge this please

@pytorchbot pytorchbot added the merge-this-please Was marked for merge with @pytorchbot merge this please label May 15, 2020
@choidongyeon
Copy link
Author

Thanks for taking a look @ssnl!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 8c07a98.

@choidongyeon choidongyeon deleted the transformer-compose branch May 20, 2020 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-this-please Was marked for merge with @pytorchbot merge this please Merged open source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants