Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock can be caused by remote functions calling other remote functions and getting the results. #231

Closed
robertnishihara opened this issue Jan 27, 2017 · 3 comments
Labels
bug Something that is supposed to be working; but isn't

Comments

@robertnishihara
Copy link
Collaborator

The following causes the system to hang.

import ray
ray.init(num_workers=1)

@ray.remote
def f():
  return

@ray.remote
def g():
  ray.get(f.remote())

ray.get(g.remote())

Since there is only one worker, the worker will start executing g. Then when g calls f, f can't be scheduled anywhere because there is only one worker and it is being used to execute g. But that worker won't finish executing g until f has finished executing. Hence deadlock.

There are two natural solutions (that I can think of at the moment).

  1. Detect the situation (probably in the local scheduler) and start more workers so that f can be scheduled.
  2. Make remote functions re-entrant in the sense that when g calls ray.get, it can get a new task from the local scheduler, execute it, and then resume executing g.
@robertnishihara robertnishihara added the bug Something that is supposed to be working; but isn't label Jan 27, 2017
@robertnishihara
Copy link
Collaborator Author

This is being worked on in #286.

@robertnishihara
Copy link
Collaborator Author

This is partially addressed by #286. However, that some workloads could require an arbitrarily large number of workers to be started, so this is only a partial solution.

@robertnishihara
Copy link
Collaborator Author

Further discussion should probably be continued in a different issue because the issue is no longer deadlock, but rather too many workers being started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't
Projects
None yet
Development

No branches or pull requests

1 participant