-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nf-core download <pipeline name> #53
Conversation
Codecov Report
@@ Coverage Diff @@
## master #53 +/- ##
===========================================
- Coverage 94.8% 72.63% -22.17%
===========================================
Files 4 6 +2
Lines 500 665 +165
===========================================
+ Hits 474 483 +9
- Misses 26 182 +156
Continue to review full report at Codecov.
|
* Validates pipeline input, fetches workflow details and release hash * Download workflow files * If -s specified, automatically finds container names from workflow config * Tries to pull singularity images using first singularity, then docker
scripts/nf-core
Outdated
help = "Pipeline release" | ||
) | ||
@click.option( | ||
'-s/-n', '--singularity/--no-singularity', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to have something like -s yes
/ -s no
instead?
I find it disturbing to have two options for two opposites things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess - the above is the standard way to handle boolean flags with click (see docs), so this is the way I've done it in the past. But I don't really mind either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I didn't knew the standards.
I'm fine with it then ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I did it as a single flag first - just -s
to get singularity images and nothing to not get them. But then I thought it was better to be explicit and require this input from the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, after playing a bit more I agree. As it's required the help message says that it's missing -s
if not supplied, which is kind of counter-intuitive.
Just need to find a nice way to handle y|Y|yes|Yes|YES
for both options now. I bet click
has a nice tidy way to handle this for you, if I can just find it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too would prefer just one parameter -s
or --singularity-images
with yes/no as only accepted values, that can be done by using choices or giving a call_back
.
Or just --singularity/--no-singularity
(without -s/-n
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I half implemented the Choice
thing this morning, but if we want to handle case insensitivity and short/long form then the list of allowed options gets messy... 🤓 (y|Y|yes|Yes|YES|n|N|no|No|NO
).
We could write a custom callback or string handling behaviour, but that feels like overkill and the help text is not particularly pretty then. click
already has this kind of handling for the --yes
behaviour. click.BOOL
can also handle string values, so I figure that click probably has built-in support for this somehow...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah!! looks like setting type=click.BOOL
does the trick right ? for y/n values
Option:
@click.option('-s', '--singularity', type=click.BOOL, default=None, required=True)
Test:
$ python cli_test.py test -s y
True
$ python cli_test.py test -s yes
True
$ python cli_test.py test -s YEs
True
$ python cli_test.py test -s YeS
True
$ python cli_test.py test -s nO
False
$ python cli_test.py test -s n
False
$ python cli_test.py test -s NO
False
$ python cli_test.py test -s NOo
Usage: cli_test.py test [OPTIONS]
Error: Invalid value for "-s" / "--singularity": noo is not a valid boolean
$ python cli_test.py test -s 0
False
$ python cli_test.py test -s 1
True
$ python cli_test.py test -s 2
Usage: cli_test.py test [OPTIONS]
Error: Invalid value for "-s" / "--singularity": 2 is not a valid boolean
scripts/nf-core
Outdated
help = "Pipeline release" | ||
) | ||
@click.option( | ||
'-s/-n', '--singularity/--no-singularity', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too would prefer just one parameter -s
or --singularity-images
with yes/no as only accepted values, that can be done by using choices or giving a call_back
.
Or just --singularity/--no-singularity
(without -s/-n
)
nf_core/download.py
Outdated
|
||
# If we got this far, must not be a nf-core pipeline | ||
if self.pipeline.count('/') == 1: | ||
# Looks like a GitHub address - try working with this repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not check this upstream ? i.e. validate the param value at very beginning. That could improve the success rate of downloading something even higher 😉
Also could be good to log that it is not a nf-core
pipeline (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, should do both 👍 Was rushing to push the code so that I wasn't late into work for Remi's talk 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is as early as I can check I think. In order to validate the pipeline name, I have to fetch the pipeline names from the website API and see if the supplied name matches any of them (the above for wf in wfs.remote_workflows
loop). So this point in the code is the first time that I know it's not an nf-core pipeline.
Agree that logging is a good idea 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging added in fd90e46
nf_core/download.py
Outdated
# Check that the outdir doesn't already exist | ||
if os.path.exists(wf.outdir): | ||
logging.error("Output directory '{}' already exists".format(wf.outdir)) | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really wanna exit ? If so why not have this check at the beginning (like as soon as receiving the parameter or in the class) that will save us all the trouble of doing everything 😇
Or just warn and save in a custom name so they can rename the folder later ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, should probably handle this in a nicer way 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I remember..
If so why not have this check at the beginning
I can't check any earlier, as if --outdir
is not specified, the default output directory name is generated based on the workflow name and version number. The version number is the last release if not specified. So basically, I don't know what the output directory name will be before I get the workflow details.
Or just warn and save in a custom name so they can rename the folder later?
Nah - this command isn't going to be super common to run (like multiqc
or nextflow
or something), so it's not such a big deal to just bail and exit I think. Less code and potentially less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see the point now 👍
nf_core/download.py
Outdated
|
||
import nf_core.list, nf_core.lint | ||
|
||
def download_workflow(pipeline, release=None, singularity=False, outdir=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, Why make it as independent function rather than not including it in the Class below ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't remember, was copying my earlier code structure for the other modules.
* Moved the nextflow config parsing into new common utils.py module * Made singularity download option a single command line boolean flag * Improved logging for download script
Now waiting for singularityhub/docker2singularity#30 to be merged so that we can create newer better singularity images using docker ✨ |
nf_core/download.py
Outdated
logging.info("Pipeline is in development. Using current code on master branch.") | ||
|
||
# Set outdir name if not defined | ||
if self.outdir is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this block can be moved up to L40 (keeping all outdir
related stuff together), so this function will be clean and does only what the name suggests ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you're getting at, but I respectfully disagree 😆 I'm just setting all of the different variables in this function that will be used later on, including the outdir
and the download link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe, it is a matter of personnel choice/prefference, so no right or wrong here 😇It makes sense in this context to update necessary variables (including outdir
) in this function as this function was called by download_workflow
(which uses outdir
later on).
I suggested this considering more general context i.e. when (or if) fetch_workflow_details
is called by any other function or script that don't have or need outdir
variable, then updating outdir
would be "out of context" 😜(Not an error but its just something not necessary then). That is why I would have put this part up directly in download_workflow
😉(Of course, since the class itself is more "download" related this might not happen, but like I said it is matter of personnel preference)
nf_core/download.py
Outdated
self.download_singularity_image(container) | ||
|
||
|
||
def get_workflow(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to be picky 😉 -> is it really a "get" function ? I mean it returns only True/False
. So maybe naming set_workflow
or collect_workflow
or something else be more appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I'll give you this one ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor stuff
nf_core/download.py
Outdated
logging.warn("Pipeline name doesn't match any nf-core workflows") | ||
logging.info("Pipeline name looks like a GitHub address - attempting to download anyway") | ||
self.wf_name = self.pipeline | ||
if self.release is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better if not self.release
nf_core/download.py
Outdated
if self.release is None: | ||
self.release = 'master' | ||
self.wf_sha = self.release | ||
if self.outdir is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better if not self.outdir
nf_core/download.py
Outdated
self.outdir += '-{}'.format(self.release) | ||
# Set the download URL and return | ||
self.wf_download_url = 'https://github.com/{}/archive/{}.zip'.format(self.pipeline, self.release) | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of boolean return values in such functions, that are not a question, like is_finished
or so. Better raise a meaningful expection like raise LookupError("Pipeline {} not found on whatever".format(pipeline))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! Yes, I felt a bit dirty writing this but it worked. Now you've suggested the raise
it seems obvious 😆
nf_core/download.py
Outdated
logging.error("Not able to find pipeline '{}'".format(self.pipeline)) | ||
logging.info("Available pipelines: {}".format(', '.join([w.name for w in wfs.remote_workflows]))) | ||
return False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise Exception here
* if not x: instead of if x is None: * Raise and catch exceptions instead of returning booleans
Hacky in-progress WIP code pushed sorry. Trying to download directly from singularity hub - I feel like it should be easy ( |
With "hang" you mean timeout? Or just forever? |
Just forever. Timeout param has no effect. A very fast minimal example in a live python environment seemed fine, so I’m not sure what’s going on. Ran out of time and had to run into work for a meeting though.. |
nf_core/download.py
Outdated
with open(out_path, 'wb') as f: | ||
for chunk in dl_request.iter_content(chunk_size=1024): | ||
if chunk: | ||
f.write(chunk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do f.flush()
too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was intentional, due to this stack overflow comment
Missing tests, but done for now unless anyone has any more suggestions I think! |
|
Next question: if the |
This is what I did at the start of the pull request, but then I changed it at the request of the reviewers 😅 (sort of, anyway. See discussion) Also note that I found having it as a flag with default
That's what the flag is for :) Not everyone uses singularity, but anyone running offline will need to download the pipeline files. So these should always be downloaded, but the singularity image should be optional. One could think of other optional downloads in the future too - reference genome files for example. Downloading from singularity hub / docker hub shouldn't really matter. shub is preferable as it doesn't require singularity to be installed. There is no method to affect this through the command line, and I'm not really sure that there needs to be. Basically, this was the logic I used to get to this interface:
Happy to change this if anyone has a better solution though! |
I am sorry, I really missed that.
Don't want to be picky, but we should stick to common Unix-like CLI. A required option is not an option. If you want to download the singularity image, provide the
With the rest I totally agree, and I like the feature very much :) So, now beat me :P |
This can be basically seen from two points of view: Do we want people to be able to just run the command and then everything they need is there or do we want them to use this tool for specific use cases and then require them to specify exactly what they want to have? I'd rather say keep it optional, as the download might take quite a while and then people can choose if they want to do that on a per-user basis. I understand having the problem to forget it, but also downloading things that you don't want everytime is quite similarly annoying ;-) |
Yup, I didn't really have a strong opinion on this and agree with all of the points made. I'll switch the current logic with a single |
Would it be good/helpful to be able to just download the container images for given pipeline ? |
@senthil10 I think this is a useful feature! 👍 probably with a flag |
I'm a little confused by this last request - in discussion I think it was about rerunning the command if we forget to specify We have to download the pipeline files to get the container, as the pipeline config is parsed to get the container name dynamically. So we could do a special mode which does this and then deletes those files (or just saves them to |
@ewels Ah ok, I see. Ok, then let's not stress this pull request to much, there are still tests to write. @senthil10 We can keep this in our mind, if there is a future request for such a feature coming up, we can still brainstorm about and implement it. @ewels: I am happy to write the tests if you like :) |
@sven1103 yes, I agree too 👍 @ewels have already done a great job with this PR :) As you mentioned we can add this in the future depending upon need (maybe something to debate during In general I think it can be a good feature to have, though its not a crucial feature it would still give little more flexibility for advance users. For instance if a user downloads just the pipeline first, then tweaks the pipeline a bit for fun or internal necessity and later if they ran into a situation where they needed the containers this feature might come in handy. It can be either done by |
@sven1103 - if you could write the tests that would of course be amazing 🎉 😉 @senthil10 - I still don't really think that this would be particularly useful, since our pipelines have only one container (Sarek has loads, which is why a script makes sense). So it really is just The point of this download script is to specifically to:
|
...and if someone could leave a positive review and merge, that'd be fab 😉 |
@ewels I did say "it is not crucial, it would just give more flexibility" and I said that thinking of packages with multiple containers, I dint know all the packages in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ewels I think you did a great implementation with this feature 🎉 ! I really like it! I will merge and write the tests over the weekend. |
@senthil10 @ewels The buildContainers script is IMHO only useful when having to build several containers. It was also a good opportunity to fun trying some Nextflow stuff. |
Thanks for the discussion everyone! |
New subcommand to download a pipeline and its singularity containers.
-s
is specified, downloads singularity imagesIf singularity is not installed, tries using docker (withdocker2singularity
)To do:
Example usage:
Example output: