Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: Can't pickle local object... when using multiprocessing #1554

Closed
harpone opened this issue Jan 24, 2020 · 5 comments
Closed

Comments

@harpone
Copy link

harpone commented Jan 24, 2020

🐛 Bug

This isn't really a torch_xla bug, but rather a "feature" when using distributed samplers with multiple worker processes as described here.

The problem arises if you're using a lambda expression or a function inside a generate_datasets function or similar, instead of e.g. a proper transformations class, because for some reason, multiprocessing can't pickle that properly (I don't even have an idea why MP wants to pickle anything in the first place but I'm sure there's a good reason for that).

To Reproduce

Steps to reproduce the behavior:

Let's say we have
dataset_trn, loader_trn = get_datasets(args_)

and

def get_datasets(args):
    transform_target = lambda y: dict_of_stuff[y]
    (other stuff)
    dataset_trn = torch.utils.data.Dataset(...,
                                   target_transform=transform_target,
                                   ...)
    sampler_trn = DistributedSampler(...)
    loader_trn = torch.utils.data.DataLoader(dataset_trn, sampler=sampler_trn, ...)

    return dataset_trn, loader_trn

This will throw

AttributeError: Can't pickle local object 'get_datasets.<locals>.<dataset_trn>'

but only when using the multiprocessing style training. Fix is to use a class for the transform instead of a lambda expression or a function.

Anyway, I struggled quite a bit with this weird error (multiprocessing stuff makes debugging pretty hard), so just posting this here in case others will encounter it. Not sure if there's an official fix per se, so feel free to close immediately.

@dlibenzi
Copy link
Collaborator

Are you creating these datasets before calling xmp.spawn() and passing them as args or globals?

@harpone
Copy link
Author

harpone commented Jan 27, 2020

No, they're created for each process.

@harpone
Copy link
Author

harpone commented Jan 27, 2020

also closing since not really a torch_xla bug...

@Rainbowman0
Copy link

Are you creating these datasets before calling xmp.spawn() and passing them as args or globals?

creating these datasets before calling xmp.spawn() and passing them as args or globals

I did exactly what you said, i.e. 'creating these datasets before calling xmp.spawn() and passing them as args or globals' and ran into the same problem. How to solve it?

@Rainbowman0
Copy link

Are you creating these datasets before calling xmp.spawn() and passing them as args or globals?

creating these datasets before calling xmp.spawn() and passing them as args or globals

I did exactly what you said, i.e. 'creating these datasets before calling xmp.spawn() and passing them as args or globals' and ran into the same problem. How to solve it?

I did this because I was using Nvidia's DALI to speed up the data loading process, and I found that if I create a DataLoader by each process in xmp.spawn(), DALI will create four DataLoaders instead of one like Pytorch .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants