Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Ray 1.7 and 1.8 ray.init hangs when some versions of Python gRPC library is used due to global symbol export #20132

Closed
1 of 2 tasks
kiralaz opened this issue Nov 7, 2021 · 15 comments · Fixed by #20237
Closed
1 of 2 tasks
Assignees
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks
Milestone

Comments

@kiralaz
Copy link

kiralaz commented Nov 7, 2021

Search before asking

  • I searched the issues and found no similar issues.

Ray Component

Ray Core

What happened + What you expected to happen

This is not a contribution.

After upgrading to Ray 1.7, we noticed that when we import pybinded C++ code that uses a different version of grpc than comes with ray, we consistently get segfaults.

Turns out this is due to Ray exporting its grpc symbols so they are global and available to things like _streaming.so. This seems to have been added in #18490

When I comment out *grpc*; at

(so the symbols are not exported) and build ray 1.7.0 locally, our code does not segfault.

Looking at #18870, it seems like *absl*; caused a similar issue and has already been removed, but @ashione mentions in #18870 (comment) that *grpc*; might be needed for some ray/streaming tests to pass.

Would it be possible for someone to look into this and remove grpc; ? I think it would be preferable to not export global symbols for commonly used libs like grpc and absl, to avoid causing issues for anyone using ray alongside C extensions built with other versions of these libs.

This is currently blocking my team from upgrading to ray 1.7+.

Thanks!

Versions / Dependencies

ray==1.7.0

Reproduction script

N/A

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@kiralaz kiralaz added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 7, 2021
@rkooo567
Copy link
Contributor

rkooo567 commented Nov 7, 2021

Cc @scv119 I think we should handle this (and probably fine streaming tests not passing)

@rkooo567 rkooo567 added this to the Core Backlog milestone Nov 7, 2021
@scv119 scv119 added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 7, 2021
@kiralaz kiralaz changed the title [Bug] [Bug] grpc symbols are exported globally Nov 8, 2021
@ashione
Copy link
Member

ashione commented Nov 8, 2021

@rkooo567 This #20136 might fix symbol-wise issuse. I put duplicated symbols into _streaming.so.

@rkooo567
Copy link
Contributor

rkooo567 commented Nov 8, 2021

Oh nice. Let’s merge it and see if it fixes the issue

@fishbone
Copy link
Contributor

fishbone commented Nov 8, 2021

@ashione assign this issue to you since you are working on this.

@rkooo567
Copy link
Contributor

Hey @kiralaz can you check the latest master if this is resolved? We merged a PR that could solve the issue

@kiralaz
Copy link
Author

kiralaz commented Nov 10, 2021

Hey @kiralaz can you check the latest master if this is resolved? We merged a PR that could solve the issue

Thanks @rkooo567! #20136 looks like it should fix my issue.

Though when I tried python3 setup.py install from ray/python on master, I got

In file included from src/ray/util/filesystem.cc:15:0:
bazel-out/k8-opt/bin/_virtual_includes/ray_util/ray/util/filesystem.h:17:10: fatal error: filesystem: No such file or directory
 #include <filesystem>
          ^~~~~~~~~~~~
compilation terminated.

For the bazel build of some ray_pkg targets:

subprocess.CalledProcessError: Command '['bazel', 'build', '--verbose_failures', '--', '//:ray_pkg', '//cpp:ray_cpp_pkg']' returned non-zero exit status 1.

@rkooo567
Copy link
Contributor

I think the build requires gcc >=9.0. Can you check ur compiler version?

@kiralaz
Copy link
Author

kiralaz commented Nov 10, 2021

I see, I was on gcc 7. It's a bit tricky for me to upgrade my gcc right now, but saw the PR landed 23 hours ago so I just tried via the nightly wheel from https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-2.0.0.dev0-cp36-cp36m-manylinux2014_x86_64.whl assuming the fix made it to the latest build.

Happy to report I didn't get a segfault 🎉 😄

@rkooo567
Copy link
Contributor

Glad to hear that! I will close the issue. @scv119 @ericl we should consider having a patch release with this commit. I've seen other users who went through the same issue. Also we should consider adding code owner to the src/ray/ray_version_script.lds file.

@rkooo567 rkooo567 changed the title [Bug] grpc symbols are exported globally [Bug] Ray 1.7 and 1.8 ray.init hangs on some versions of Python gRPC dependency due to global symbol export Nov 11, 2021
@rkooo567
Copy link
Contributor

This is fixed in the master, and the change will be included in 1.9

@kiralaz
Copy link
Author

kiralaz commented Nov 11, 2021

Thank you @rkooo567 ! And @ashione for the fix 🙂

@rkooo567 rkooo567 pinned this issue Nov 11, 2021
@rkooo567 rkooo567 changed the title [Bug] Ray 1.7 and 1.8 ray.init hangs on some versions of Python gRPC dependency due to global symbol export [Bug] Ray 1.7 and 1.8 ray.init hangs when certain versions of Python gRPC library is used due to global symbol export Nov 11, 2021
@rkooo567 rkooo567 changed the title [Bug] Ray 1.7 and 1.8 ray.init hangs when certain versions of Python gRPC library is used due to global symbol export [Bug] Ray 1.7 and 1.8 ray.init hangs when some versions of Python gRPC library is used due to global symbol export Nov 11, 2021
@edoakes edoakes unpinned this issue Nov 12, 2021
@rkooo567
Copy link
Contributor

rkooo567 commented Nov 24, 2021

@edoakes Was there any special reason to unpin this btw? I think we should pin it until 1.9 is released! (It can happen quite commonly)

@edoakes
Copy link
Contributor

edoakes commented Nov 30, 2021

@rkooo567 sorry I have no idea how I did that, that was accidental

@rkooo567 rkooo567 pinned this issue Nov 30, 2021
@simon-mo
Copy link
Contributor

simon-mo commented Dec 6, 2021

@rkooo567 1.9 is release now, should we unpin?

@rkooo567
Copy link
Contributor

rkooo567 commented Dec 6, 2021

Yes! Please go ahead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants