-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Object Spilling] Clean up FS storage upon sigint for ray.init(). #13649
[Object Spilling] Clean up FS storage upon sigint for ray.init(). #13649
Conversation
@@ -243,6 +256,43 @@ def delete_spilled_objects(self, urls: List[str]): | |||
filename = parse_url_with_offset(url.decode()).base_url | |||
os.remove(os.path.join(self.directory_path, filename)) | |||
|
|||
def destroy_external_storage(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just delete the entire directory with one call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree this is a simpler logic; The only concern I have is the UX is probably bad (since we cannot guarantee users will give us a directory that is solely for object spilling). What about this? I will always create a directory appended to a given spilling directory and delete that directory directly. For example,
file_system_object_spilling_config = {
"type": "filesystem",
"params": {
"directory_path": "/tmp/something"
}
}
This will create a new directory /tmp/something/ray_spilled_objects/[files]
and I will delete ray_spilled_objects
so that we can avoid deleting files that are not related to object spilling.
Yeah it sounds good. When we turn it on by default, this can be the
directory inside the session dir.
…On Sat, Jan 23, 2021, 11:36 PM SangBin Cho ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In python/ray/external_storage.py
<#13649 (comment)>:
> @@ -243,6 +256,43 @@ def delete_spilled_objects(self, urls: List[str]):
filename = parse_url_with_offset(url.decode()).base_url
os.remove(os.path.join(self.directory_path, filename))
+ def destroy_external_storage(self):
Agree this is a simpler logic; The only concern I have is the UX is
probably bad (since we cannot guarantee users will give us a directory that
is solely for object spilling). What about this? I will always create a
directory appended to a given spilling directory and delete that directory
directly. For example,
file_system_object_spilling_config = {
"type": "filesystem",
"params": {
"directory_path": "/tmp/something"
}
}
This will create a new directory
/tmp/something/ray_spilled_objects/[files] and I will delete
ray_spilled_objects so that we can avoid deleting files that are not
related to object spilling.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#13649 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSVOQHJQQPRAWRRYFZTS3PEWHANCNFSM4WPRBKBA>
.
|
It is simplified. |
@ericl Found another race condition when we go to the deleting directory solution. When the directory deletion is performed, if there are still IO workers that are deleting files, it throws an exception and fails. To avoid it I added an additional logic. If you prefer just the previous solution since it also has complexity lmk. |
python/ray/external_storage.py
Outdated
# deleting the file at the same time. | ||
pass | ||
except Exception: | ||
print("There were unexpected errors while deleting " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print("There were unexpected errors while deleting " | |
logger.exception("Error cleaning up spill files") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still want to display the traceback right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lmk if you'd like me to delete a traceback msgs.
python/ray/external_storage.py
Outdated
except Exception: | ||
logger.exception("Error cleaning up spill files\n" | ||
f"Directory path: {self.directory_path}\n" | ||
f"Traceback: {traceback.format_exc()}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to include the traceback; logger.exception already does this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. Didn't know about that. I will fix this and merge the PR.
…y-project#13649) * Initial iteration done. * Remove unnecessary messages. * Addressed code review. * Addressed code review. * fix issues. * addressed code review. * Addressed the last code review.
…t(). (ray-project#13649)" This reverts commit b0fc13f.
…t(). (ray-project#13649)" This reverts commit b0fc13f.
Why are these changes needed?
This cleans up FS storage upon sigint. It also prints the progress when it is shutting down (so that users won't be confused why their driver is not terminated quickly).
The current output looks like...
Note
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.