-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with resource deadlock warning. #5790
Comments
#5789 is related. |
cc @edoakes |
This is working as designed -- a warning will eventually show up. But feel free to ignore them if the cluster is still scaling up. We could add another delay though to reduce the false positives. |
Hi, I'm a bot from the Ray team :) To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months. If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel. |
The behavior seems to have changed here. Now if I run import ray
ray.init(num_gpus=2, num_cpus=2)
@ray.remote(num_gpus=1)
def f():
return
@ray.remote(num_gpus=1)
def g():
ray.get(f.remote())
ray.get([g.remote() for _ in range(2)]) it deadlocks (as it should), but there is no warning (either in stdout or in the dashboard). |
The resource deadlock warning will fire spuriously (some fraction of the time) during the following workload.
The reason is that it is waiting for a worker to start up.
If we then subsequently run something which actually does deadlock, like
it may not print any warning because the warning happens at most once every
debug_dump_period_
milliseconds.The text was updated successfully, but these errors were encountered: