Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If you copy checkpoints from HOME to gcs they can get deleted #394

Open
sshleifer opened this issue Jul 8, 2023 · 1 comment
Open

If you copy checkpoints from HOME to gcs they can get deleted #394

sshleifer opened this issue Jul 8, 2023 · 1 comment

Comments

@sshleifer
Copy link

sshleifer commented Jul 8, 2023

because of this line

if is_gcs_path(path) and not (path / _COMMIT_SUCCESS_FILE).exists():

They don't have success file but are in GCS so orbax thinks its tmp and cleans it up.

I would suggest always or never saving COMMIT_SUCCESS file.

This is not blocking me (easy to just write extra commit success files once I found this) but it felt like I should report because it was very unexpected behavior and moving around checkpoints is super common.

@cpgaffney1
Copy link
Collaborator

Thanks for the report, we currently have different behavior for ensuring atomicity on GCS vs. other filesystems. This was sort of a practice that we inherited from earlier code. I will look into standardizing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants