Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.distributed.rpc package not work well with generator and lambda #42705

Open
frank-dong-ms-zz opened this issue Aug 6, 2020 · 1 comment
Labels
module: rpc Related to RPC, distributed autograd, RRef, and distributed optimizer triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@frank-dong-ms-zz
Copy link

frank-dong-ms-zz commented Aug 6, 2020

I'm using torch.distributed.rpc package to work on a distributed training POC, currently I'm seeing rpc package itself is using pickle and pickle not work well with some python features like generator and lambda, so that put extra limitation, what even worse is if I try to reference some other python package that used lambda or generator I have no way to combine usage of this and rpc package.

So my question is:

  1. can we use other pickling package that has fewer limitations like dill?
  2. Any suggestion to work around this issue, especially when I try to use rpc with another package already using lambda or generator?

Thanks for any suggestions.

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @rohan-varma @xush6528 @jjlilley @osalpekar @jiayisuse @agolynski

@ailzhang ailzhang added oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Aug 7, 2020
@mrshenli
Copy link
Contributor

mrshenli commented Aug 7, 2020

Hey @frank-dong-ms

can we use other pickling package that has fewer limitations like dill?

It might be hard to pull dill in as a dependency for PyTorch, but applications should be able to override the pickler used by RPC. See the code below:

# Create _internal_rpc_pickler only once to initialize _dispatch_table only once
_internal_rpc_pickler = _InternalRPCPickler()
def serialize(obj):
return _internal_rpc_pickler.serialize(obj)
def deserialize(binary_data, tensor_table):
return _internal_rpc_pickler.deserialize(binary_data, tensor_table)

Any suggestion to work around this issue, especially when I try to use rpc with another package already using lambda or generator?

Could you share an example that can repro the serde error?

@mrshenli mrshenli added module: rpc Related to RPC, distributed autograd, RRef, and distributed optimizer and removed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Aug 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: rpc Related to RPC, distributed autograd, RRef, and distributed optimizer triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants