-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent parallel jobs from overwriting the same s3 object when saving the cache #63
Comments
Just to follow up on this issue, the problem is multiple re-write of the cache that is already there. A simple use-case is a Ideally it should check that cache under the key already exists before saving - similar case in the GHA implementation of caches would save the cache for the fastest job, and then bail out on others with this error
|
I'm looking for the possible solutions on this. Still not started to work on code but I'm considering few options. |
Note: s3 key is not available yet until actual object completely uploaded, and request finished. |
Potentially can reserve it thru a dummy file. But to be honest, even having double check before upload would already cut a lot of cases of re-upload, especially for the long polls in pipelines. Might not even track the current upload for that. |
Multipart uploads solve this issue if versioning is enabled in the bucket. Writing a manifest file (or "dummy file") is usually done after a big upload to confirm you're done (e.g. when uploading 1000s of CSVs or Parquet files) -- if you use the manifest file to signal "I am uploading", and you die, others will see the manifest and think, "gosh guess he's still uploading"! Multipart uploads handle this situation gracefully where everyone is racing to write the current version, but no one can write an incomplete version! |
Suppose we have N jobs running at the same time using the same cache key and that there is no cache saved yet. The job to finish last will overwrite the cache saved by the ones before.
It seems like the plugin only checks for the s3 object in
restore()
but not incache()
.The text was updated successfully, but these errors were encountered: