Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch datasets don't support multiprocessing #74

Open
jeremyjordan opened this issue Jan 2, 2023 · 0 comments · May be fixed by #75
Open

PyTorch datasets don't support multiprocessing #74

jeremyjordan opened this issue Jan 2, 2023 · 0 comments · May be fixed by #75

Comments

@jeremyjordan
Copy link

PyTorch's Dataloader has an argument for num_workers which can fetch items from your dataset in parallel using multiprocessing, but this requires your dataset to be able to be pickled so Python can distribute it across multiple processes.

Currently, if you try to use multiple workers for a muspy dataset you get the following error:

AttributeError: Can't pickle local object 'Dataset.to_pytorch_dataset.<locals>.TorchRepresentationDataset'

There's more context on the pickle issue in this Stackoverflow thread.

Here's a minimal reproducible example to test it out yourself:

import muspy
import torch

haydn = muspy. HaydnOp20Dataset("data/", download_and_extract=True).convert()
dataset = haydn.to_pytorch_dataset(representation="pianoroll")
dataloader = torch.utils.data.DataLoader(dataset, batch_size=1, num_workers=2)
batch = next(iter(dataloader))

I'm happy to open a PR with a fix for this, it mostly involves moving TorchRepresentationDataset and TorchMusicFactoryDataset to be defined outside of to_pytorch_dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant