Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install SARS-CoV-2 workflow dependencies in Native runtime #115

Merged
merged 1 commit into from Jul 7, 2022

Conversation

victorlin
Copy link
Member

@victorlin victorlin commented Jun 23, 2022

preview

Description of proposed changes

The Nextstrain Docker image used by the Docker runtime comes with many
other software tools, so that the runtime can work with all core
pathogen workflows. In contrast, the current Native runtime installs a
"bare minimum" set of software, requiring individual workflows to
specify additional installation commands just for the Native runtime.
This commit effectively syncs the two runtimes to provide the same
tools.

In the long term, it would be best to abandon this approach of
supporting n pathogen workflows by a single setup (for both Docker and
Native runtime). While this might seem like taking a step backwards by
putting more potentially unused tools in the Native runtime, the
difference of available tools between Docker/Native currently causes
confusion for users and unnecessary instructions specific to the Native
runtime for each pathogen workflow.

Related issue(s)

Testing

N/A

TODO

Copy link
Member

@tsibley tsibley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fully on board with this short-term step and the long-term ideal described. 🙌 I think it's better for users than more cross-linking (like #115 adds) but would still say that merge of this should wait until we have broadly concurring feedback from more of the team.

src/install.rst Outdated
@@ -106,7 +106,7 @@ These instructions will install the Nextstrain CLI and tools to run and view you

mamba create -n nextstrain \
-c conda-forge -c bioconda \
nextstrain-cli augur auspice nextalign snakemake git \
nextstrain-cli augur auspice nextalign snakemake git epiweeks pangolin pangolearn \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking

Did you look into if adding these three packages significantly increases the installation time or size on disk?

Copy link
Member Author

@victorlin victorlin Jun 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't, but that would be good to know! Here's some benchmarks (tl;dr: no significant increase):

Installation time

Note that each time is taken after cleaning all caches so these are worst-case scenarios given my processor and network speed. Not much difference here (93.63s vs. 98.21s user+sys time).

conda clean --all --force-pkgs-dirs
time mamba create -n tmp1 \
    -c conda-forge -c bioconda \
    nextstrain-cli augur auspice nextalign snakemake git \
    --yes
# 52.33s user 41.30s system 92% cpu 1:41.63 total

conda clean --all --force-pkgs-dirs
time mamba create -n tmp2 \
    -c conda-forge -c bioconda \
    nextstrain-cli augur auspice nextalign snakemake git epiweeks pangolin pangolearn \
    --yes
# 55.48s user 42.73s system 90% cpu 1:48.14 total

Disk usage

1.5G -> 1.6G (~0.1G increase).

du -hs /opt/homebrew/Caskroom/miniconda/base/envs/tmp1
# 1.5G	/opt/homebrew/Caskroom/miniconda/base/envs/tmp1
du -hs /opt/homebrew/Caskroom/miniconda/base/envs/tmp2
# 1.6G	/opt/homebrew/Caskroom/miniconda/base/envs/tmp2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

The Nextstrain Docker image used by the Docker runtime comes with many
other software tools, so that the runtime can work with all core
pathogen workflows. In contrast, the current Native runtime installs a
"bare minimum" set of software, requiring individual workflows to
specify additional installation commands just for the Native runtime.
This commit effectively syncs the two runtimes to provide the same
tools.

In the long term, it would be best to abandon this approach of
supporting *n* pathogen workflows by a single setup (for both Docker and
Native runtime). While this might seem like taking a step backwards by
putting more potentially unused tools in the Native runtime, the
difference of available tools between Docker/Native currently causes
confusion for users and unnecessary instructions specific to the Native
runtime for each pathogen workflow.
@victorlin victorlin force-pushed the victorlin/install-ncov-dependencies-in-native branch from 55fa033 to b954874 Compare June 28, 2022 23:12
@victorlin victorlin requested a review from a team July 7, 2022 19:21
Copy link
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this solution. We've started to use all of these tools for seasonal flu, now, too, so this is going to make everyone's life easier in the future.

My only note is that eventually this list will get so long that we'll be tempted to put it into a Conda environment YAML or even its own Bioconda recipe and we will have come full circle. Which I guess is a +1 for working on buildpacks (nextstrain/docker-base#71) sooner than later as a way to better manage our dependencies.

@victorlin victorlin merged commit ed237c0 into master Jul 7, 2022
@victorlin victorlin deleted the victorlin/install-ncov-dependencies-in-native branch July 7, 2022 22:40
@tsibley
Copy link
Member

tsibley commented Jul 8, 2022

My only note is that eventually this list will get so long that we'll be tempted to put it into a Conda environment YAML or even its own Bioconda recipe and we will have come full circle.

IIRC, the only reason we're not using the nextstrain meta-package in Bioconda for Conda-based installation (the original intent) is because the update path is unclear/difficult given limitations of conda/mamba?

One solution to that which I think we briefly considered but didn't pursue would be pinning versions of our packages in the meta-package and automating the update of the pins + making new meta-package releases. This is very much akin to the way to the Docker runtime image is managed currently, and there's lots of benefits to it…

(A reliable meta-package would also make it much easier to implement the vision I have for nextstrain setup, which would be a counterpart to check-setup that would do the work of getting a runtime (Docker, Conda, etc) setup. This is particularly compelling combined with the standalone installation archives for Nextstrain CLI to avoid bootstrapping issues.)

Which I guess is a +1 for working on buildpacks (nextstrain/docker-base#71) sooner than later as a way to better manage our dependencies.

Buildpacks definitely help here for containerized runtimes (Docker, AWS Batch, Terra, (in-some-future) Singularity).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Streamline native installation steps for subproject references
3 participants