-
Notifications
You must be signed in to change notification settings - Fork 5.6k
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tune] The actor or task cannot be scheduled right now #13905
Comments
Try using |
Unfortunately this does not solve the problem.
|
Can you give us a bit more context? What version of Ray are you using, and can you share parts of the code you're using to run your training run? The call to |
I try to use
The |
I reduce the number of GPUs per trial and does not specify the number of cpu per trial. It works.
|
So is your problem solved by this? Just for completeness sake, how many CPUs are actually on your machine? |
80 cpus |
I see. So usually this should work. Another question would be if you're scheduling other remote Ray jobs in your trainable ( |
yeah, there are other remote ray jobs in |
Not exactly - the 10 CPUs are reserved just for the main function of the trainable. If this main function requests more resources, you need to use the E.g.:
This would reserve 10 CPUs and 0.25 GPUs. The See also here: https://docs.ray.io/en/latest/tune/tutorials/overview.html#how-do-i-set-resources Please note that in the future we will deprecate support for |
That said, how you allocate the resources depends on your main function, which I don't have insight into. If you're starting a number of remote CPU workers, these resources need to be included in the |
I get it. |
You're welcome! Please feel free to re-open if any issues remain. |
I think Ray needs a method to clear the queue of unwanted actors. Both in the API and in the Dashboard. I can open a new Issue if you like, but the reason is the same as in this issue here: the "actors queue" gets filled after model training, and you cannot make a prediction from the trained model, because all CPU resources (of which none is actually used) remain locked (until at least client python kernel restart in most cases, but sometimes also the master and slave(s) Ray servers, incl. a similar case to the one reported by OP, encountered during ML models training and scoring using |
Notice that small caps "cpu" and "gpu" are not affecting CPU / GPU settings at all... see what happens if you pass a small-cap "cpu" key in a dict to the
Ray expects all-caps keys ("CPU" and "GPU") in the dict passed to So the workaround quoted below may not work at all (like in my case), and we need a well-understood method to release locked or otherwise unwanted resources (actors etc.), something like
|
I have enough resources but still report a warning:
How should i deal with this problem?
Thanks.
The text was updated successfully, but these errors were encountered: