-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ray core] Stopped job leaks worker #44897
Comments
The issue is that the leaked idle workers are not in the |
Leaked again. From the state dump, I can see that there are two Python workers not idle, but they are shown as IDLE on the dashboard. The two registered jobs have already been stopped. No job running now.
|
Both of the leaked workers have received the force kill request. They have released all reference counts but are waiting for the flight tasks.
I am not sure why 908 + 90 != 1002, but there is no
|
From the source code: https://github.com/ray-project/ray/blob/master/src/ray/core_worker/core_worker.cc#L4236 |
@codingl2k1 you are absolutely right and I have a PR to fix it #44214 |
What happened + What you expected to happen
No idle worker exists for the stopped job.
Versions / Dependencies
2.10.0
Reproduction script
No reliable script to reproduce this issue.
Issue Severity
High
The text was updated successfully, but these errors were encountered: