-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use eval output for tool versions #1115
base: dev
Are you sure you want to change the base?
Conversation
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
@drpatelh @ewels now that Nextflow has channel topics, it occurred to me that we could actually simplify a lot by just using // current nf-core convention
process FOOBAR {
output:
path 'versions.yml', topic: versions
"""
# ...
cat <<-END_VERSIONS > versions.yml
"${task.process}":
foo: \$(foo --version)
bar: \$(bar --version)
END_VERSIONS
"""
}
// env output
process FOOBAR {
output:
tuple val("${task.process}"), val('foo'), env(FOO_VERSION), topic: versions
tuple val("${task.process}"), val('bar'), env(BAR_VERSION), topic: versions
"""
# ...
FOO_VERSION=\$(foo --version)
BAR_VERSION=\$(bar --version)
"""
}
// cmd output
process FOOBAR {
output:
tuple val("${task.process}"), val('foo'), cmd('foo --version'), topic: versions
tuple val("${task.process}"), val('bar'), cmd('bar --version'), topic: versions
"""
# ...
"""
} I would love to hear what you guys think about (2) vs (3). Keep in mind that in all three cases, the tool version commands are executed in the task script in more or less the same way. |
My preference is for option 3 - the new NB: The |
My preference goes to version2, which I find more explicit, easier to read, but I do love the version3 that removes completely the version generation from the script itself. |
I like three, my only concern is some of the commands to get the version get pretty long. In theory, we could do something like: def foo_version = 'foo --version'
output:
tuple val("${task.process}"), val('foo'), cmd("${foo_version}"), topic: versions
|
I like option 3! |
I would really like the inputs/ outputs section to remain as concise as possible, and I like the separation of concerns where the command to produce the output happens in more or less the same place. I'd do a bit of a WTF if people suddenly started embedding extensive process stuff where I expect the I/O. So I have a fairly strong dislike for option 3), I think some fairly horrific stuff could happen there and make the processes hard to understand. So option 2) for me please! |
Agreeing with @pinin4fjords there, version3 looks beautiful as long as all works well, when it starts to bug, it's a mess to debug. |
@pinin4fjords - note that one of the limitations of That will hopefully prevent people from doing anything too horrendous 😆 We could have an nf-core modules linting rule that checks the string length and fails if it's too long, suggesting that people use |
There's plenty of evil to be done with pipes! |
@mirpedrol - No it won't work. Suggestion would be to use |
rnaseq has a couple of R modules actually, they're just not obvious because they're local- and we will hopefully fix that at some point, and they will then need templates etc. |
Thank you all for your feedback. I still prefer
Note that the
@emiller88 I don't think you can reference local variables in an output as in your example, but you could reference a global variable, for example: foo_version = 'really | long | version | command'
process foo {
output:
cmd("${foo_version}")
}
@mirpedrol In this PR I changed all the processes to only emit the metadata and then the YAML is constructed at the end of the pipeline. If you usually generate the tool version from within a Python or R script, the
You could have a multi-line command by using semi-colons for newlines 😅 Regarding multi-line outputs, we found a way to support them for both |
So I think everyone agrees that options 2 + 3 are both improvements ✅ For any processes with script blocks written in languages other than bash, we will have to use the Option 1:
|
I think the real thing this could open up is parsing the version string in groovy as another option // cmd + variable output
foo_version = getVersionFromString('foo --version')
bar_version = 'bar --version'
process FOOBAR {
output:
tuple val("${task.process}"), val('foo'), cmd(foo_version), topic: versions
} in a lib far far away: def getVersionFromString(String text) {
def matcher = text =~ /v(\d+\.\d+\.\d+)/
return matcher ? matcher[0][1] : null
} Just a thought. |
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
The thing is that the command must be executed in the task environment, because Nextflow might not have access to the tool from outside the task. You could just emit the raw output of the tool version command, remove the duplicates, and then parse the string in Groovy: process FOOBAR {
output:
tuple val("${task.process}"), val('foo'), cmd('foo --version'), topic: versions
}
Channel.topic('versions') .map { process, tool, raw_version ->
[ process, tool, getVersionFromString(tool, raw_version) ]
} That comes down to whether you would rather parse the version with a Bash one-liner or Groovy code. Note that you have to write a custom parser for every tool, so putting it all in a lib far far away would break the modularity of your modules. Unless you have a way to "register" a parser from the module script. |
This PR uses the experimental
cmd
output type in nextflow-io/nextflow#4493 to simplify the collection of tool versions.Once the topic channel support is merged into Nextflow, we can merge this PR with #1109 to simplify things further. Instead of emitting
versions1
,versions2
, etc for processes with multiple tools, we can simply send them all to the 'versions' topic.PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).