New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cloud cache plugin #4097
Add cloud cache plugin #4097
Conversation
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Sonatype Lift is retiringSonatype Lift will be retiring on Sep 12, 2023, with its analysis stopping on Aug 12, 2023. We understand that this news may come as a disappointment, and Sonatype is committed to helping you transition off it seamlessly. If you’d like to retain your data, please export your issues from the web console. |
✅ Deploy Preview for nextflow-docs-staging canceled.
|
…th scheme Signed-off-by: Ben Sherman <bentshermann@gmail.com>
modules/nextflow/src/main/groovy/nextflow/cache/PathCacheStore.groovy
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the underlying virtual file system is a good idea. I'm concerned to on 503 slow down error that can be reported when resulting large pipeline because the object storage API will be under pressure.
Also, it would make sense to move this into its own plugin to be activated to enable this feature
Me too. The AWS SDK should be able to handle it with it's built-in adaptive retry, but I don't remember if it's enabled by default. If that's not enough, we could maintain a local cache and sync with S3 at a controlled rate, at the cost of losing some progress if the workflow is terminated abruptly.
I made it enabled with an environment variable |
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
I moved the patch cache to a plugin and added the index file to behave like the default. One problem with the plugin is that there is no way to enable it with the |
Likely the only way to enable it in those commands is via the NXF_PLUGINS_DEFAULT, which is not very handy. However, the primary need for this plugin is to use it with Tower, therefore I would not care too much |
Okay, then this PR is ready for review. I tested with rnaseq-nf by running and then resuming the pipeline. Looks like there is still a problem with |
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Ok, I've removed the docs from AWS page because it should work for any cloud storage and added a few basic integration tests |
Ok, those tests works, but it should be added some checks that's effectively resuming the tasks and using the cloud cache. Logging adding some grep in the log file. @bentsherman can you please give it a try? then it can be merged |
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
I've removed sonatype-lif, just annoying |
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
This is b-e-a-utiful, thanks guys! 🎉 Looking forward to taking this for a spin! |
@pditommaso is there a way to trigger the cloudcache plugin when the |
That kind of trick happens here nextflow/modules/nf-commons/src/main/nextflow/plugin/PluginsFacade.groovy Lines 351 to 372 in ace32d0
|
The nf-cloudcache plugin allows persisting the nextflow cache metadata into a cloud object storage instead of using the embedded leveldb engine Signed-off-by: Ben Sherman <bentshermann@gmail.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com> Co-authored-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Close https://github.com/seqeralabs/nf-tower-cloud/issues/4822
Adds a new cache store backed by S3. The cache entry for each task is saved to the following S3 path:
It can be enabled by setting
NXF_CACHE_STORE
tos3
andNXF_CACHE_PATH
to the desired S3 path.Notes:
Currently included in the nf-amazon plugin, but could be generalized to any object storage since it uses the Path API. Maybe better to make it part of the nextflow module and call it something likeObjectCacheStore
.${workDir}/../cache