Add cloud cache plugin #4097

bentsherman · 2023-07-14T23:00:06Z

Close https://github.com/seqeralabs/nf-tower-cloud/issues/4822

Adds a new cache store backed by S3. The cache entry for each task is saved to the following S3 path:

s3://<bucket>/<path>/<session-id>/<task-hash>

It can be enabled by setting NXF_CACHE_STORE to s3 and NXF_CACHE_PATH to the desired S3 path.

Notes:

Currently included in the nf-amazon plugin, but could be generalized to any object storage since it uses the Path API. Maybe better to make it part of the nextflow module and call it something like ObjectCacheStore.
Consider exposing config options in addition to environment variables.
Consider using buckets for the task hash (like the work directory) instead of storing all task entries under the same S3 prefix. I don't know if this will actually help though. Usually Nextflow just needs to try a particular key, not iterate through all keys.
Consider setting the default cache path based on the work directory (e.g. ${workDir}/../cache

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

sonatype-lift · 2023-07-14T23:00:10Z

Sonatype Lift is retiring

Sonatype Lift will be retiring on Sep 12, 2023, with its analysis stopping on Aug 12, 2023. We understand that this news may come as a disappointment, and Sonatype is committed to helping you transition off it seamlessly. If you’d like to retain your data, please export your issues from the web console.
We are extremely grateful and thank you for your support over the years.

📖 Read about the impacts and timeline

netlify · 2023-07-14T23:00:13Z

✅ Deploy Preview for nextflow-docs-staging canceled.

Name	Link
🔨 Latest commit	`193a5ec`
🔍 Latest deploy log	https://app.netlify.com/sites/nextflow-docs-staging/deploys/64b819444e4fd4000836791d

…th scheme Signed-off-by: Ben Sherman <bentshermann@gmail.com>

modules/nextflow/src/main/groovy/nextflow/cache/PathCacheStore.groovy

pditommaso

Using the underlying virtual file system is a good idea. I'm concerned to on 503 slow down error that can be reported when resulting large pipeline because the object storage API will be under pressure.

Also, it would make sense to move this into its own plugin to be activated to enable this feature

bentsherman · 2023-07-17T13:08:37Z

I'm concerned to on 503 slow down error that can be reported when resulting large pipeline because the object storage API will be under pressure.

Me too. The AWS SDK should be able to handle it with it's built-in adaptive retry, but I don't remember if it's enabled by default.

If that's not enough, we could maintain a local cache and sync with S3 at a controlled rate, at the cost of losing some progress if the workflow is terminated abruptly.

Also, it would make sense to move this into its own plugin to be activated to enable this feature.

I made it enabled with an environment variable NXF_CACHE_STORE='path' so that it doesn't have to reside in a separate plugin. It still uses the default cache store by default.

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

bentsherman · 2023-07-18T18:02:37Z

I moved the patch cache to a plugin and added the index file to behave like the default.

One problem with the plugin is that there is no way to enable it with the clean and log commands that use the cache. How do you recommend we instrument those commands?

pditommaso · 2023-07-18T19:07:44Z

Likely the only way to enable it in those commands is via the NXF_PLUGINS_DEFAULT, which is not very handy. However, the primary need for this plugin is to use it with Tower, therefore I would not care too much

bentsherman · 2023-07-18T19:19:48Z

Okay, then this PR is ready for review. I tested with rnaseq-nf by running and then resuming the pipeline. Looks like there is still a problem with packing.gradle, but I'm not sure how to fix it.

packing.gradle

plugins/nf-path-cache/src/main/nextflow/PathCachePlugin.groovy

docs/plugins.md

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

… s3-cache-store

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

pditommaso · 2023-07-19T09:28:24Z

Ok, I've removed the docs from AWS page because it should work for any cloud storage and added a few basic integration tests

validation/google.sh

pditommaso · 2023-07-19T10:44:39Z

Ok, those tests works, but it should be added some checks that's effectively resuming the tasks and using the cloud cache. Logging adding some grep in the log file. @bentsherman can you please give it a try? then it can be merged

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

validation/awsbatch.sh

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

validation/awsbatch.sh

pditommaso · 2023-07-19T16:44:01Z

I've removed sonatype-lif, just annoying

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

abhi18av · 2023-07-20T10:30:07Z

This is b-e-a-utiful, thanks guys! 🎉

Looking forward to taking this for a spin!

bentsherman · 2023-08-07T16:14:03Z

@pditommaso is there a way to trigger the cloudcache plugin when the NXF_CLOUDCACHE_PATH variable is set? Currently you have to enable the plugin explicitly, seems like you shouldn't have to do that since it's a core plugin. I'm not sure how to do this through the priority extensions loader.

pditommaso · 2023-08-07T16:16:37Z

That kind of trick happens here

nextflow/modules/nf-commons/src/main/nextflow/plugin/PluginsFacade.groovy

Lines 351 to 372 in ace32d0

    
           def specs = parseConf(config) 
        
           if( isSelfContained() && specs ) { 
        
               // custom plugins are not allowed for nextflow self-contained package 
        
               log.warn "Nextflow self-contained distribution allows only core plugins -- User config plugins will be ignored: ${specs.join(',')}" 
        
               return Collections.emptyList() 
        
           } 
        
           if( specs ) { 
        
               log.debug "Plugins declared=$specs" 
        
           } 
        
           if( getPluginsDefault() ){ 
        
               final defSpecs = defaultPluginsConf(config) 
        
               specs = mergePluginSpecs(specs, defSpecs) 
        
               log.debug "Plugins default=$defSpecs" 
        
           } 
        
           // add tower plugin when config contains tower options 
        
           if( (Bolts.navigate(config,'tower.enabled') || env.TOWER_ACCESS_TOKEN ) && !specs.find {it.id == 'nf-tower' } ) { 
        
               specs << defaultPlugins.getPlugin('nf-tower') 
        
           } 
        
           if( (Bolts.navigate(config,'wave.enabled') || Bolts.navigate(config,'fusion.enabled')) && !specs.find {it.id == 'nf-wave' } ) { 
        
               specs << defaultPlugins.getPlugin('nf-wave') 
        
           }

The nf-cloudcache plugin allows persisting the nextflow cache metadata into a cloud object storage instead of using the embedded leveldb engine Signed-off-by: Ben Sherman <bentshermann@gmail.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com> Co-authored-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

Add S3 cache store

3a72d88

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

bentsherman requested a review from pditommaso July 14, 2023 23:00

bentsherman and others added 2 commits July 14, 2023 18:17

Rename S3 cache store to path-based cache store, generalize to any pa…

2601fcf

…th scheme Signed-off-by: Ben Sherman <bentshermann@gmail.com>

Merge branch 'master' into s3-cache-store

498d91d

pditommaso reviewed Jul 16, 2023

View reviewed changes

modules/nextflow/src/main/groovy/nextflow/cache/PathCacheStore.groovy Outdated Show resolved Hide resolved

pditommaso requested changes Jul 16, 2023

View reviewed changes

bentsherman added 2 commits July 18, 2023 12:17

Move path cache to nf-path-cache plugin

ae90744

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

Add index file

f42b34c

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

pditommaso requested changes Jul 18, 2023

View reviewed changes

packing.gradle Outdated Show resolved Hide resolved

plugins/nf-path-cache/src/main/nextflow/PathCachePlugin.groovy Outdated Show resolved Hide resolved

docs/plugins.md Outdated Show resolved Hide resolved

Apply suggestions from review

4e6413a

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

bentsherman requested a review from pditommaso July 18, 2023 21:11

pditommaso added 3 commits July 19, 2023 10:58

Merge branch 's3-cache-store' of github.com:nextflow-io/nextflow into…

f43b12e

… s3-cache-store

Minor changes [ci fast]

808e41b

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

Add integration tests

24f52ac

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

sonatype-lift bot reviewed Jul 19, 2023

View reviewed changes

validation/google.sh Show resolved Hide resolved

Add check to resumed run with cloud cache

f748bb4

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

sonatype-lift bot reviewed Jul 19, 2023

View reviewed changes

validation/awsbatch.sh Show resolved Hide resolved

Add check that cloud cache is being used

78dd823

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

sonatype-lift bot reviewed Jul 19, 2023

View reviewed changes

validation/awsbatch.sh Show resolved Hide resolved

sonatype-lift bot reviewed Jul 19, 2023

View reviewed changes

validation/awsbatch.sh Show resolved Hide resolved

Add checks to azure and google tests

193a5ec

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

pditommaso approved these changes Jul 20, 2023

View reviewed changes

pditommaso merged commit ac90cc2 into master Jul 20, 2023
20 checks passed

pditommaso deleted the s3-cache-store branch July 20, 2023 07:47

pditommaso changed the title ~~Add S3 cache store~~ Add cloud cache plugin Jul 27, 2023

bentsherman mentioned this pull request Aug 13, 2023

Embedded cache DB alternative #2774

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cloud cache plugin #4097

Add cloud cache plugin #4097

bentsherman commented Jul 14, 2023 •

edited

sonatype-lift bot commented Jul 14, 2023

netlify bot commented Jul 14, 2023 •

edited

pditommaso left a comment

bentsherman commented Jul 17, 2023 •

edited

bentsherman commented Jul 18, 2023

pditommaso commented Jul 18, 2023

bentsherman commented Jul 18, 2023

pditommaso commented Jul 19, 2023

pditommaso commented Jul 19, 2023

pditommaso commented Jul 19, 2023

abhi18av commented Jul 20, 2023 •

edited

bentsherman commented Aug 7, 2023

pditommaso commented Aug 7, 2023

Add cloud cache plugin #4097

Add cloud cache plugin #4097

Conversation

bentsherman commented Jul 14, 2023 • edited

sonatype-lift bot commented Jul 14, 2023

Sonatype Lift is retiring

netlify bot commented Jul 14, 2023 • edited

✅ Deploy Preview for nextflow-docs-staging canceled.

pditommaso left a comment

Choose a reason for hiding this comment

bentsherman commented Jul 17, 2023 • edited

bentsherman commented Jul 18, 2023

pditommaso commented Jul 18, 2023

bentsherman commented Jul 18, 2023

pditommaso commented Jul 19, 2023

pditommaso commented Jul 19, 2023

pditommaso commented Jul 19, 2023

abhi18av commented Jul 20, 2023 • edited

bentsherman commented Aug 7, 2023

pditommaso commented Aug 7, 2023

bentsherman commented Jul 14, 2023 •

edited

netlify bot commented Jul 14, 2023 •

edited

bentsherman commented Jul 17, 2023 •

edited

abhi18av commented Jul 20, 2023 •

edited