-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial release review [DO NOT MERGE] #29
Draft
scwatts
wants to merge
541
commits into
master
Choose a base branch
from
dev
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
+ Put read_alignment and read_processing subworkflows into targeted workflow. + Uncomment the whole wgts worflow and integrate this with the read_alignment and read_processing subworkflows.
Otherwise generates a java.nio.channels.OverlappingFileLockException error
The Debian slim container didn't have ps, which is required by Nextflow
Change admonition style to fix usage docs rendering
Add known issue section to README.md
Fix display of the ORANGE report screencap on the nf-core website
Make README.md admonition compatible with website
…ation Fix conditional creation of bwa-mem2 index
Use all available threads for fastp process
mirpedrol
reviewed
May 22, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All local modules are good candidates to be added to nf-core/modules
, this would make the modules easier to maintain, and the pipeline PR reviews easier.
- update Conda environments to use fixed Sambamba Bioconda package - new container build also uses fixed Sambamba package
Apply the first set of reviewer recommendations
Bump SvPrep to 1.2.4
Bump Sambamba to 1.0.1
Bump MarkDups 1.1.5 to Bioconda build hdfd78af_1
Bump bwa-mem2 container build
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
nf-core/oncoanalyser is a cancer DNA/RNA analysis and reporting pipeline that implements the hmftools workflow, which is developed by the Hartwig Medical Foundation.
The hmftools workflow provides comprehensive analysis of cancer DNA/RNA data where each component is fine-tuned and optimised to operate in an integrated manner to improve genomic characterisation.
There are certain expectations around reference and input data, and how the hmftools workflow should be used. As such, these expectations have guided certain design decisions in oncoanalyser with some described below to provide further context for the community review.
Overarching design choices
Separation of run modes into workflows. The hmftools workflow is flexible and can be used to analyse different types of data (e.g. WGTS, targeted sequencing), which require specific configuration and reference data. There are also additional modes planned for hmftools that will eventually be implemented in oncoanalyser.
Hence, I have structured oncoanalyser so that the majority of the logic corresponding to different run modes is lifted up into workflows rather than at the subworkflow or module level. While this introduces some code duplication, much of the mode-specific logic is simplified by dividing it into different workflows providing a better maintenance and development experience. Retaining this layout will ease the addition of new run modes in the future.
Arbitrarily run individual tools or enter at any point. A key feature of oncoanalyser is the ability to begin analysis from any point in the pipeline or run just a specific set of tools. The primary use-case for this is to skip variant calling and run new/supplementary/modified downstream analyses. For example, a user can run oncoanalyser from PURPLE so long as they provide the required inputs.
Accordingly, outputs from each tool can be provided in the samplesheet and are recognised by oncoanalyser. When a user provides input for a given tool, it is retrieved in the respective subworkflow and inserted into the appropriate input channel. The feature to arbitrarily select tools to run is implemented by wrapping each tool subworkflow call in a conditional statement that checks whether the corresponding tool should be run.
Strong reference genome recommendations. Hartwig support only the reference genomes they distribute for analysis with the hmftools workflows. Hence, oncoanalyser is configured to provide users with a choice of the Hartwig GRCh37 or GRCh38 genome. While not recommended, custom reference genomes can be used (including those from iGenomes) and oncoanalyser will build all necessary indexes for the target analysis. There is additionally an option to write/publish prepared reference data to the output directory.
No egress fee reference data hosting. Given the use of Hartwig-distributed genomes and size of reference data required to run oncoanalyser, a cloud hosting provider is needed to deliver this data to users. I chose to host all default reference data (genomes, hmftools/panel resource files, and other databases) on Cloudflare R2, which is an object storage service that doesn't charge for egress. I've used Cloudflare R2 for this purpose since Jan 2023 with the nominal hosting costs being covered by our organisation (UMCCR).
Other notes