Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ray Core] failed to serialize namedtuple in py3.9 #25951

Closed
auderson opened this issue Jun 21, 2022 · 16 comments · Fixed by #42730
Closed

[Ray Core] failed to serialize namedtuple in py3.9 #25951

auderson opened this issue Jun 21, 2022 · 16 comments · Fixed by #42730
Assignees
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P2 Important issue, but not time-critical
Milestone

Comments

@auderson
Copy link

What happened + What you expected to happen

ray failed to serialize namedtuple in py3.9, which looks like a error from cloudpickle:
cloudpipe/cloudpickle#460

Versions / Dependencies

python 3.9.7
ray 1.13.0

not tested with other versions

Reproduction script

from typing import NamedTuple
import ray
ray.init()

class T(NamedTuple):
    a:str
    
@ray.remote
def test():
    return T('a')

ray.get(test.remote())
Traceback (most recent call last):
  File "/home/auderson/miniconda3/envs/py3.9/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3361, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-7f1d37acd5e0>", line 1, in <cell line: 1>
    ray.get(test.remote())
  File "/home/auderson/miniconda3/envs/py3.9/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/auderson/miniconda3/envs/py3.9/lib/python3.9/site-packages/ray/worker.py", line 1831, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::test() (pid=353630, ip=...)
RuntimeError: The remote function failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment:
ray::test() (pid=353630, ip=...)
  File "/home/auderson/miniconda3/envs/py3.9/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle.py", line 872, in _make_skeleton_class
    skeleton_class = types.new_class(
  File "/home/auderson/miniconda3/envs/py3.9/lib/python3.9/types.py", line 77, in new_class
    return meta(name, resolved_bases, ns, **kwds)
  File "/home/auderson/miniconda3/envs/py3.9/lib/python3.9/typing.py", line 1852, in __new__
    module=ns['__module__'])
KeyError: '__module__'
ray.util.inspect_serializability(T)
================================================
Checking Serializability of <class '__main__.T'>
================================================
Out[9]: (True, set())

Issue Severity

Low: It annoys or frustrates me.

@auderson auderson added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 21, 2022
@tupui
Copy link
Member

tupui commented Jun 21, 2022

Hi @auderson, thank you for reporting. As you found out, this is not really an issue with ray but with cloudpickle. I am afraid there is not much to be done here. Did you try what they suggested in the issue with moving the class in its module?

Out of curiosity, why not using a dataclass or collections.namedtuple?

@tupui tupui removed the triage Needs triage (eg: priority, bug/not-bug, and owning component) label Jun 21, 2022
@auderson
Copy link
Author

auderson commented Jun 21, 2022

I don’t know the implementation detail of serialization in ray. Because ray.util.inspect_serializability reports True, so I think maybe I should report this in case there are other potential bugs.

I didn’t try their suggestion, but I guess that can work.

For your last question, using typing.NamedTuple is just my personal preference.

@scv119 scv119 added the core Issues that should be addressed in Ray Core label Jun 24, 2022
@scv119 scv119 added the P1 Issue that should be fixed within a few weeks label Jun 24, 2022
@scv119
Copy link
Contributor

scv119 commented Jun 24, 2022

@suquark is this something we documented or plan to fix?

@scv119 scv119 added this to the Core Backlog milestone Jun 24, 2022
@rkooo567
Copy link
Contributor

rkooo567 commented Jul 8, 2022

@scv119 @iycheng can you guys make sure @suquark follow up on this (or find another owner? )

@suquark
Copy link
Member

suquark commented Jul 12, 2022

@auderson Yes, this is indeed an error from cloudpickle. Currently a workaround is to move

class T(NamedTuple):
    a:str

to another python module and import it (as you mentioned).

@rkooo567 @scv119 I think we should wait for the upstream (cloudpickle) to fix it if this is not a major blocker.

ray.util.inspect_serializability(T) failed to give the right hints because this is an unusual abnormality in cloudpickle (with python3.9) and common checks would not catch it. I think we should just fix this case in cloudpickle w/o adding special rules for inspect_serializability.

I can also help improving inspect_serializability in a general way to possibly figure out more issues about this kind of objects.

@rkooo567
Copy link
Contributor

@suquark do you happen to know if cloudpickle is aware of this issue and plan to fix it?

@suquark
Copy link
Member

suquark commented Jul 25, 2022

@rkooo567 yes, cloudpickle is aware of this in cloudpipe/cloudpickle#460. But I am not sure when it is going to be fixed in the upstream.

@ericl ericl removed the Ray 2.0 label Jul 28, 2022
@hora-anyscale hora-anyscale added P2 Important issue, but not time-critical and removed P1 Issue that should be fixed within a few weeks labels Oct 25, 2022
@hora-anyscale
Copy link
Contributor

Per Triage Sync: Dependent on upstream - cloudpipe/cloudpickle#460

@GoncaloPascoal
Copy link

Looks like cloudpickle fixed this issue (v2.2.1).

@rkooo567
Copy link
Contributor

Nice. We are using the custom fork of the cloud pickle. We may need to update the version..

@rkooo567 rkooo567 assigned rkooo567 and unassigned suquark and fishbone Apr 26, 2023
@rkooo567
Copy link
Contributor

rkooo567 commented Apr 26, 2023

@suquark curious if you have bandwidth to backport cloudpickle?

sgoodfriend added a commit to sgoodfriend/rl-algo-impls that referenced this issue Jan 5, 2024
@rickyyx
Copy link
Contributor

rickyyx commented Jan 26, 2024

no longer reproducible.

@rickyyx rickyyx closed this as completed Jan 26, 2024
@moriyoshi
Copy link

It's still reproducing. The bundled version of cloudpickle didn't catch up with the upstream version the problem was fixed in.

sgoodfriend added a commit to sgoodfriend/rl-algo-impls that referenced this issue Jan 31, 2024
@rkooo567
Copy link
Contributor

@rickyyx is it possible your newest changes also fix this issue?

@rkooo567
Copy link
Contributor

@moriyoshi we are bringing the upstream changes to the latest ray now

@rickyyx
Copy link
Contributor

rickyyx commented Jan 31, 2024

Oh - i actually ran the repro on the current ray master and wasn't able to repro it.

Ok - seeing it in master, and verify the PR fixes it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P2 Important issue, but not time-critical
Projects
None yet
Development

Successfully merging a pull request may close this issue.