Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

DistributedDataSampler converts NamedTuple to regular tuple #38028

Closed
malhotraa opened this issue May 7, 2020 · 5 comments
Closed

DistributedDataSampler converts NamedTuple to regular tuple #38028

malhotraa opened this issue May 7, 2020 · 5 comments
Labels
module: data parallel oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@malhotraa
Copy link

malhotraa commented May 7, 2020

馃悰 Bug

DistributedDataSampler converts NamedTuple to regular tuple due to this code.

To Reproduce

Steps to reproduce the behavior:

  1. Create a NamedTuple type MyTuple
  2. Return a MyTuple object from the collate_fn in your dataset
  3. During training you will find the loaded data batch is a regular tuple instead of named tuple.

Expected behavior

NamedTuple data loaded using DDP shouldnt be implicitly converted to regular tuples.

Environment

PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce RTX 2080 Ti
Nvidia driver version: 418.39
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip] numpy==1.18.1
[pip] torch==1.4.0
[pip] torchvision==0.5.0
[conda] numpy 1.18.1 pypi_0 pypi
[conda] torch 1.4.0 pypi_0 pypi
[conda] torchvision 0.5.0 pypi_0 pypi

Additional context

n/a

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar

@malhotraa
Copy link
Author

cc @mrshenli

@mrshenli mrshenli added module: data parallel oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 8, 2020
@mrshenli
Copy link
Contributor

mrshenli commented May 8, 2020

Hi @malhotraa, thanks for reporting the issue and identifying the culprit code. Would you like to contribute a PR to fix this? :)

@malhotraa
Copy link
Author

@mrshenli for sure. will create a PR shortly.

@malhotraa
Copy link
Author

@mrshenli I am struggling a bit with finding a clean way to check if an object is an instance of namedtuple so that it can be handled correctly by the scatter method. There is a kinda hacky check here. Would you recommend using it?

@malhotraa
Copy link
Author

Looks like this issue has been addressed by #44220

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: data parallel oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

2 participants