New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[runtime_env] Client connection fails with TimeoutError: CreateRuntimeEnv request failed after 5 attempts.
when using working dir with different file system on head node
#19792
Comments
Found this in the logs on the docker container, I wonder if the runtime env agent failed to start. Let me try installing the dependencies and seeing if that fixes it. (If this is the case, we also need to understand if #19491 is working correctly or not) root@ce509e942283:/tmp/ray/session_latest/logs# cat dashboard.log
2021-10-27 19:35:20,644 INFO head.py:122 -- Dashboard head grpc address: 172.17.0.3:43225
2021-10-27 19:35:20,645 INFO dashboard.py:90 -- Setup static dir for dashboard: /usr/local/lib/python3.8/dist-packages/ray/dashboard/client/build
2021-10-27 19:35:20,648 INFO head.py:50 -- Connect to GCS at b'172.17.0.3:34535'
2021-10-27 19:35:20,650 INFO utils.py:200 -- Get all modules by type: DashboardHeadModule
2021-10-27 19:35:20,669 ERROR dashboard.py:239 -- The dashboard on node ce509e942283 failed with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/dashboard.py", line 224, in <module>
loop.run_until_complete(dashboard.run())
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/dashboard.py", line 116, in run
await self.dashboard_head.run()
File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/head.py", line 211, in run
modules = self._load_modules()
File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/head.py", line 159, in _load_modules
head_cls_list = dashboard_utils.get_all_modules(
File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/utils.py", line 206, in get_all_modules
importlib.import_module(name)
File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 848, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/modules/job/data_types.py", line 1, in <module>
from pydantic import BaseModel
ModuleNotFoundError: No module named 'pydantic' |
Yup, this is just an issue of missing dependencies when only The two recent PRs cause the dashboard not to start when only
I think I'll downgrade to P1 since we have a simple workaround, which is to install |
cc @jiaodong Let's make the |
@richardliaw do we have any plans to add test cases for |
@architkulkarni Sorry to piggyback on this issue but the repro is 95% the same: After doing the original setup in docker, and then:
After a second run of
ray_client_server_23001.err:
|
Thanks a lot for the investigation and detailed repros! This issue sounds like the right place for it, I'll continue to investigate |
The issue of |
I am still seeing this issue on macOS
|
Hi @wjrforcyber, do you have any more details about your workload? Are there any relevant logs in |
I found another issue with the exact same log here but no more discussion and bot had the issue closed. I think it's still a M2 chip issue. |
@wjrforcyber got it, let's keep this issue for the |
Hi! Thanks for the great software! Sorry if this is the wrong issue for this. I'm hitting I wanted to pass the full env to a "blank" worker which was perhaps not super. Do you think you will add more extensive import guards or is the docker image runtime my best bet? |
Search before asking
Ray Component
Others
What happened + What you expected to happen
The following script fails when connecting to a cluster with a different file system than the client:
on my machine, hangs for ~30 secs then:
Versions / Dependencies
Python 3.8
Ray commit 99a0088
MacOS client, Ubuntu head node
Reproduction script
Repro is done by running the head node in docker and connecting client from host machine
docker setup
Inside the container
In host shell
Anything else
sanity check
inside docker:
on local machine:
This works, (prints "Done!")
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: