-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError: Can't pickle local object...
when using multiprocessing
#1554
Comments
Are you creating these datasets before calling |
No, they're created for each process. |
also closing since not really a torch_xla bug... |
I did exactly what you said, i.e. 'creating these datasets before calling xmp.spawn() and passing them as args or globals' and ran into the same problem. How to solve it? |
I did this because I was using Nvidia's DALI to speed up the data loading process, and I found that if I create a DataLoader by each process in xmp.spawn(), DALI will create four DataLoaders instead of one like Pytorch . |
🐛 Bug
This isn't really a torch_xla bug, but rather a "feature" when using distributed samplers with multiple worker processes as described here.
The problem arises if you're using a lambda expression or a function inside a
generate_datasets
function or similar, instead of e.g. a proper transformations class, because for some reason, multiprocessing can't pickle that properly (I don't even have an idea why MP wants to pickle anything in the first place but I'm sure there's a good reason for that).To Reproduce
Steps to reproduce the behavior:
Let's say we have
dataset_trn, loader_trn = get_datasets(args_)
and
This will throw
AttributeError: Can't pickle local object 'get_datasets.<locals>.<dataset_trn>'
but only when using the multiprocessing style training. Fix is to use a class for the transform instead of a lambda expression or a function.
Anyway, I struggled quite a bit with this weird error (multiprocessing stuff makes debugging pretty hard), so just posting this here in case others will encounter it. Not sure if there's an official fix per se, so feel free to close immediately.
The text was updated successfully, but these errors were encountered: