Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to specify plugins within test profiles or non-top scopes #1964

Closed
skrakau opened this issue Mar 11, 2021 · 21 comments
Closed

Possibility to specify plugins within test profiles or non-top scopes #1964

skrakau opened this issue Mar 11, 2021 · 21 comments
Labels
Milestone

Comments

@skrakau
Copy link
Contributor

skrakau commented Mar 11, 2021

New feature

Hi again, I need to come back to the question already mentioned in #1963 about specifying plugins within test.conf config files. Currently, with the Nextflow 21.03.0-edge release, when adding the nf-amazon plugin definition within a test.conf file (or I guess in any non-top scope) I get:

N E X T F L O W  ~  version 21.03.0-edge
Launching `/home-link/qeakr01/development/mag/main.nf` [mad_waddington] - revision: f377b362c1
Plugins definition is only allowed in config top-most scope

Usage scenario

For some nf-core pipelines the full-size test data is stored on the amazon S3 filesystem and it would be very helpful if the corresponding test profile can still be run on non-aws instances in the future, without the need for the user to specify custom config files extra for this. I do this very often for testing purposes and I think others from the nf-core team would also need this functionality.

@skrakau
Copy link
Contributor Author

skrakau commented Mar 12, 2021

Maybe it's worth an explanation: the reason why I wanted to load the nf-amazon plugin within the test.conf and not within the nextflow.config is that I assumed the latter might cause problems if the user provides additionally data from a different S3 filesystem (by specifying another plugin within a custom config file). But maybe that wouldn't be a problem.

A related question would be, if it's possible to specify nf-amazon for igenomes.config or nextflow.config and another s3 within the custom config file for other input files.

@ewels
Copy link
Member

ewels commented Mar 26, 2021

To add to this - I was just caught out by the same thing when adding the following to a minimal example main.nf instead of nextflow.config:

plugins {
  id 'nf-amazon'
}

It triggered the following error:

N E X T F L O W  ~  version 21.03.0-edge
Launching `./main.nf` [jovial_kalam] - revision: b9692aedd4
No signature of method: Script_35bd8815.plugins() is applicable for argument types: (Script_35bd8815$_runScript_closure1) values: [Script_35bd8815$_runScript_closure1@4fcc0416]
Possible solutions: print(java.lang.Object), print(java.lang.Object), print(java.lang.Object), print(java.io.PrintWriter)
 -- Check script 'main.nf' at line: 1 or see '.nextflow.log' file for more details

@pditommaso
Copy link
Member

The plugins cannot go in the pipeline script. One workaround would be to add it in the cli when specifying the genome profile e.g.

nextflow run <pipeline> -plugins nf-amazon -profile ignomes

I know, bit boring

@ewels
Copy link
Member

ewels commented Mar 26, 2021

Or even better if it doesn't have to be defined at all 🙄 😉

I'm pretty worried that these are going to cause chaos for all @nf-core pipelines if I'm honest, with our heavy usage of reference genomes on AWS...

@pditommaso
Copy link
Member

yeah, but then the problem is when offline, the plugin cannot be downloaded :/

@ewels
Copy link
Member

ewels commented Mar 26, 2021

When offline the s3 paths can't be downloaded either..

@pditommaso
Copy link
Member

pditommaso commented Mar 26, 2021

I think we can agree on this 😄

@ewels
Copy link
Member

ewels commented Mar 26, 2021

So is the idea that we define the plugin name in all nf-core pipelines in order to be able to use the AWS-iGenomes references in them? Does that break stuff for anyone wanting to use a different object storage system for their data? (Apologies if this is the wrong place to have this discussion..)

@pditommaso
Copy link
Member

Putting the plugin name in all nf-core pipelines would break the execution when running in an offline environment because NF would try to download the plugin.

What I'm not understanding are the AWS-iGenomes references used by all pipelines?

@ewels
Copy link
Member

ewels commented Mar 28, 2021

Most @nf-core pipelines have the igenomes config (it comes with the pipeline), yeah. Users can configure the base path to use a local directory if all of iGenomes is downloaded, and of course use their own references. But a lot of people just use the AWS-iGenomes directly (based on the download stats anyway).

@pditommaso
Copy link
Member

Need to check if it's possible to have the plugins nested within a profile

@ewels
Copy link
Member

ewels commented Mar 29, 2021

What I don't really understand is why it needs to be in the pipeline code at all. Ignoring the AWS-iGenomes thing for a minute, surely most of the time this will be something that a user needs to manage rather than a pipeline developer?

If it's possible to have all of these plugins installed at once and Nextflow knows how to deal with the s3 paths, can it not just be part of the -self-update command that Nextflow fetches and updates all core plugins? That way you can still do regular small updates of the plugins but we don't have to worry about them at pipeline level...

@pditommaso
Copy link
Member

What I don't really understand is why it needs to be in the pipeline code at all. Ignoring the AWS-iGenomes thing for a minute, surely most of the time this will be something that a user needs to manage rather than a pipeline developer?

Because the plan is to have application plugins for example to handle SQL db or access a dataset that requires some special library. This is why the requirement is that the pipeline should declare in the pipeline config.

If it's possible to have all of these plugins installed at once and Nextflow

Actually, the a nextflow plugins install command for that (tho hidden because I still need to refine it), which copies the plugins files into the $HOME/.nextflow/plugins path. However, it's still necessary to declare the plugin either in config file (pipeline or $HOME/.nextflow/config), via cli option -plugins or env variable NXF_PLUGINS_DEFAULT.

@ewels
Copy link
Member

ewels commented Mar 30, 2021

Right, I'm not against plugins per se - quite the opposite. I think that it can and will be a super powerful feature. Your examples of pipelines which are fixed to specific data sources are super nice.

My objection is for things where the pipeline developer can't know about data sources, primarily file access. Almost all Nextflow pipelines take files as inputs and the beauty of Nextflow being so portable is that they can come from anywhere - local, https, ftp, buckets etc etc. But by mandating that the pipeline developer needs to add nf-amazon, nf-google and nf-azure to access files on those systems, you lose that portability.

ok, so questions:

  1. Is the only way forward with cross-cloud data access for pipeline developers to declare all of these plugins in every pipeline, just in case a user wants to access files on those systems?

  2. Does declaring all at once in one pipeline work? eg:

    plugins {
        id 'nf-amazon'
        id 'nf-azure'
        id 'nf-google'
    }
    ch_from_amazon = Channel.fromPath( 's3://aws-bucket/data/sequences.fa' )
    ch_from_azure  = Channel.fromPath( 'gs://google-bucket/data/sequences.fa' )
    ch_from_google = Channel.fromPath( 'azure:azure-bucket/data/sequences.fa' )
  3. If we need to do that for all @nf-core pipelines, do you not agree that it makes more sense to have this functionality as a core Nextflow feature? Or at least not need to define it at pipeline level?

@pditommaso
Copy link
Member

But by mandating that the pipeline developer needs to add nf-amazon, nf-google and nf-azure to access files on those systems, you lose that portability

Kind of disagree, because the pipeline is portable irrespective of the platform, but yes the user pulling the data from a cloud should add the required plugin.

  1. Is the only way forward with cross-cloud data access for pipeline

well, either command-line options, config file and env variable

  1. Does declaring all at once in one pipeline work?

yup

  1. If we need to do that for all @nf-core pipelines, do you not agree that it makes more sense to have this functionality as a core Nextflow feature

Not much because we have already amazon, azure, google cloud. Today we have added dnanexus, and surely more will come. Putting all this stuff as a core dependency would result in a huge bloated runtime that's bad especially when pulling this stuff in the cloud.

However, I understand your concern that the user should not care about configuring the plugin needed when launching a nf-core pipeline. This is why I've also added a check that automatically added the required plugins when it's specified the cloud executor e.g. when the executor is awsbatch the nf-amazon plugin is automatically added.

I think the only problem that remains to address is when a pipeline launched locally needs to access cloud-stored files.

@ewels
Copy link
Member

ewels commented Mar 31, 2021

This is why I've also added a check that automatically added the required plugins when it's specified the cloud executor e.g. when the executor is awsbatch the nf-amazon plugin is automatically added.

Ok great, this puts my mind at ease quite a lot.. 😄

well, either command-line options, config file and env variable

Could you clarify this a bit? My (limited) understanding was that they had to be declared in the pipeline nextflow.config file and that was the only way to do it. But you're saying that if the user installs the plugin via the command line option (nextflow plugins install) then accessing eg. AWS s3 paths will work without any mention of the plugin in the pipeline code?

@pditommaso
Copy link
Member

Ok great, this puts my mind at ease quite a lot

Actually, it checks also the pipeline work dir

// infer from app config
final plugins = new ArrayList<PluginSpec>()
final workDir = config.workDir as String
final executor = Bolts.navigate(config, 'process.executor')
if( executor == 'awsbatch' || workDir?.startsWith('s3://') )
plugins << defaultPlugins.getPlugin('nf-amazon')
if( executor == 'google-lifesciences' || workDir?.startsWith('gs://') )
plugins << defaultPlugins.getPlugin('nf-google')
if( executor == 'azurebatch' || workDir?.startsWith('az://') )
plugins << defaultPlugins.getPlugin('nf-azure')
if( executor == 'ignite' || System.getProperty('nxf.node.daemon')=='true') {
plugins << defaultPlugins.getPlugin('nf-ignite')
plugins << defaultPlugins.getPlugin('nf-amazon')
}

Could you clarify this a bit?

Installation != plugin requirement/activation. Installation means only downloading, unzipping and copy the plugin content into the HOME/.nextflow/plugins folder.

The installation is done via the nextflow plugins install (undocumented) or automatically on NF startup when one or more plugins have been specified in:

  1. -plugins cli option
  2. nextflow config file
  3. NXF_PLUGINS_DEFAULT env variable
  4. implicit plugins inferred from the pipeline config (i.e. the link above)

Worth mentioning, they are listed in order of priority ie. if 1 is provided 2 is ignored is specified, and so on.

AWS s3 paths will work without any mention of the plugin in the pipeline code?

S3 paths work only if the nf-amazon plugin is requested following the above mechanism.

Possible solutions:

  1. Add all plugins in the config, but this will break the pipeline when running offline because NF will try to download the plugin
  2. Make the download fail and report a warning instead of failing. Cons: it could be difficult to debug and could result in faulty behavior
  3. Allow the definition of plugins in nested profiles, therefore if you add igenomes, the plugin is also added. Cons: can create be tricky to resolve the plugin version is different profiles are requiring different versions
  4. Lazy configuring the plugin requiring, i.e. if at runtime NF detect the use of S3 file, install and activate the plugin. Technically should be possible, can be tricky to implement in practice.

pditommaso added a commit that referenced this issue Apr 6, 2021
@pditommaso
Copy link
Member

I've managed to implement the solution at point 4. Therefore cloud plugins are inferred and started automatically in all cases.

You may want to give it a try using

NXF_VER=21.04.0-SNAPSHOT nextflow run .. etc

Make sure you are using this version

» NXF_VER=21.04.0-SNAPSHOT nextflow info
  Version: 21.04.0-SNAPSHOT build 5537
  Created: 06-04-2021 15:30 UTC (17:30 CEST)

@ewels
Copy link
Member

ewels commented Apr 6, 2021

Amazing! And to uninstall plugins I can just do rm -rf HOME/.nextflow/plugins?

@pditommaso
Copy link
Member

pditommaso commented Apr 6, 2021

yep

@pditommaso pditommaso added this to the v21.04.0 milestone Apr 8, 2021
@pditommaso
Copy link
Member

This is avail as of version 21.04.0-edge. Closing this issue, feel free to comment/reopen if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants