New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install SARS-CoV-2 workflow dependencies in Native runtime #115
Install SARS-CoV-2 workflow dependencies in Native runtime #115
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fully on board with this short-term step and the long-term ideal described. 🙌 I think it's better for users than more cross-linking (like #115 adds) but would still say that merge of this should wait until we have broadly concurring feedback from more of the team.
src/install.rst
Outdated
@@ -106,7 +106,7 @@ These instructions will install the Nextstrain CLI and tools to run and view you | |||
|
|||
mamba create -n nextstrain \ | |||
-c conda-forge -c bioconda \ | |||
nextstrain-cli augur auspice nextalign snakemake git \ | |||
nextstrain-cli augur auspice nextalign snakemake git epiweeks pangolin pangolearn \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking
Did you look into if adding these three packages significantly increases the installation time or size on disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't, but that would be good to know! Here's some benchmarks (tl;dr: no significant increase):
Installation time
Note that each time
is taken after cleaning all caches so these are worst-case scenarios given my processor and network speed. Not much difference here (93.63s vs. 98.21s user+sys time).
conda clean --all --force-pkgs-dirs
time mamba create -n tmp1 \
-c conda-forge -c bioconda \
nextstrain-cli augur auspice nextalign snakemake git \
--yes
# 52.33s user 41.30s system 92% cpu 1:41.63 total
conda clean --all --force-pkgs-dirs
time mamba create -n tmp2 \
-c conda-forge -c bioconda \
nextstrain-cli augur auspice nextalign snakemake git epiweeks pangolin pangolearn \
--yes
# 55.48s user 42.73s system 90% cpu 1:48.14 total
Disk usage
1.5G -> 1.6G (~0.1G increase).
du -hs /opt/homebrew/Caskroom/miniconda/base/envs/tmp1
# 1.5G /opt/homebrew/Caskroom/miniconda/base/envs/tmp1
du -hs /opt/homebrew/Caskroom/miniconda/base/envs/tmp2
# 1.6G /opt/homebrew/Caskroom/miniconda/base/envs/tmp2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
The Nextstrain Docker image used by the Docker runtime comes with many other software tools, so that the runtime can work with all core pathogen workflows. In contrast, the current Native runtime installs a "bare minimum" set of software, requiring individual workflows to specify additional installation commands just for the Native runtime. This commit effectively syncs the two runtimes to provide the same tools. In the long term, it would be best to abandon this approach of supporting *n* pathogen workflows by a single setup (for both Docker and Native runtime). While this might seem like taking a step backwards by putting more potentially unused tools in the Native runtime, the difference of available tools between Docker/Native currently causes confusion for users and unnecessary instructions specific to the Native runtime for each pathogen workflow.
55fa033
to
b954874
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this solution. We've started to use all of these tools for seasonal flu, now, too, so this is going to make everyone's life easier in the future.
My only note is that eventually this list will get so long that we'll be tempted to put it into a Conda environment YAML or even its own Bioconda recipe and we will have come full circle. Which I guess is a +1 for working on buildpacks (nextstrain/docker-base#71) sooner than later as a way to better manage our dependencies.
IIRC, the only reason we're not using the One solution to that which I think we briefly considered but didn't pursue would be pinning versions of our packages in the meta-package and automating the update of the pins + making new meta-package releases. This is very much akin to the way to the Docker runtime image is managed currently, and there's lots of benefits to it… (A reliable meta-package would also make it much easier to implement the vision I have for
Buildpacks definitely help here for containerized runtimes (Docker, AWS Batch, Terra, (in-some-future) Singularity). |
preview
Description of proposed changes
The Nextstrain Docker image used by the Docker runtime comes with many
other software tools, so that the runtime can work with all core
pathogen workflows. In contrast, the current Native runtime installs a
"bare minimum" set of software, requiring individual workflows to
specify additional installation commands just for the Native runtime.
This commit effectively syncs the two runtimes to provide the same
tools.
In the long term, it would be best to abandon this approach of
supporting n pathogen workflows by a single setup (for both Docker and
Native runtime). While this might seem like taking a step backwards by
putting more potentially unused tools in the Native runtime, the
difference of available tools between Docker/Native currently causes
confusion for users and unnecessary instructions specific to the Native
runtime for each pathogen workflow.
Related issue(s)
Testing
N/A
TODO