Problem with multiprocessing, custom __getstate__ with Tensors and forkserver #32351
Labels
module: multiprocessing
Related to torch.multiprocessing
module: serialization
Issues related to serialization (e.g., via pickle, or otherwise) of PyTorch objects
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃悰 Bug
TL;DR: multiprocessing forkserver / spawn + custom class with
__getstate__
returning tensors giveContext
Suppose the user created a custom class which holds a (large) list of torch tensors inside.
In order to handle the large list of tensors as a single tensor (in order to overcome the limit on the number of file descriptors in the system), the user implemented a custom
__getstate__
and__setstate__
.Pretty smart, this avoids the limitation on file descriptors without having to change the internal representation of the class!
Everything works fine, until the multiprocessing method is changed from
fork
to eitherforkserver
orspawn
, in which case we get a the following error:To Reproduce
The following code reproduces the problem. I illustrate the issue in two cases (which are actually the same): one when using the
DataLoader
, and another one just when creating a new process withforkserver
.Expected behavior
No error? :-)
Environment
Additional context
This has been reported in the past, see #20409, but without a repro so it was hard to act on it
cc @ssnl
The text was updated successfully, but these errors were encountered: