Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent parallel jobs from overwriting the same s3 object when saving the cache #63

Open
ghost opened this issue Jul 17, 2023 · 5 comments
Labels
enhancement New feature or request help wanted Extra attention is needed s3

Comments

@ghost
Copy link

ghost commented Jul 17, 2023

Suppose we have N jobs running at the same time using the same cache key and that there is no cache saved yet. The job to finish last will overwrite the cache saved by the ones before.

job_a [0% -----------------------100%] -> cache (will overwrite cache saved from b and c)
job_b       [0% ------------100%] -> cache (will overwrite cache saved from c)
job_c    [0% -------- 100%] -> cache

It seems like the plugin only checks for the s3 object in restore() but not in cache().

@gencer gencer added invalid This doesn't seem right help wanted Extra attention is needed labels Aug 6, 2023
@kliakhovskii-brex
Copy link

kliakhovskii-brex commented Aug 24, 2023

Just to follow up on this issue, the problem is multiple re-write of the cache that is already there.

A simple use-case is a node_module cache for a large web project, they are usually heavy (e.g. 500mb+) and required for every check in the project (e.g. test / lint / prettier), but when a key changes, every job will not see the old cache -> generate new dependencies -> upload the cache. And the upload step is usually much longer than a download, it adds 3-5 minutes to every job in a PR even though this cache was already saved from the fastest ones.

Ideally it should check that cache under the key already exists before saving - similar case in the GHA implementation of caches would save the cache for the fastest job, and then bail out on others with this error

Unable to reserve cache with key ${key}, another job may be creating this cache

@gencer
Copy link
Collaborator

gencer commented Aug 24, 2023

I'm looking for the possible solutions on this. Still not started to work on code but I'm considering few options.

@gencer gencer added enhancement New feature or request s3 and removed invalid This doesn't seem right labels Aug 24, 2023
@gencer
Copy link
Collaborator

gencer commented Aug 24, 2023

Note: s3 key is not available yet until actual object completely uploaded, and request finished.

@kliakhovskii-brex
Copy link

Note: s3 key is not available yet until actual object completely uploaded, and request finished.

Potentially can reserve it thru a dummy file. But to be honest, even having double check before upload would already cut a lot of cases of re-upload, especially for the long polls in pipelines. Might not even track the current upload for that.

@nuzayets
Copy link

nuzayets commented Nov 2, 2023

Note: s3 key is not available yet until actual object completely uploaded, and request finished.

Potentially can reserve it thru a dummy file. But to be honest, even having double check before upload would already cut a lot of cases of re-upload, especially for the long polls in pipelines. Might not even track the current upload for that.

Multipart uploads solve this issue if versioning is enabled in the bucket.

Writing a manifest file (or "dummy file") is usually done after a big upload to confirm you're done (e.g. when uploading 1000s of CSVs or Parquet files) -- if you use the manifest file to signal "I am uploading", and you die, others will see the manifest and think, "gosh guess he's still uploading"!

Multipart uploads handle this situation gracefully where everyone is racing to write the current version, but no one can write an incomplete version!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed s3
Projects
None yet
Development

No branches or pull requests

3 participants