Initial release review [DO NOT MERGE] #29

scwatts · 2024-05-13T10:02:57Z

Background

nf-core/oncoanalyser is a cancer DNA/RNA analysis and reporting pipeline that implements the hmftools workflow, which is developed by the Hartwig Medical Foundation.

The hmftools workflow provides comprehensive analysis of cancer DNA/RNA data where each component is fine-tuned and optimised to operate in an integrated manner to improve genomic characterisation.

There are certain expectations around reference and input data, and how the hmftools workflow should be used. As such, these expectations have guided certain design decisions in oncoanalyser with some described below to provide further context for the community review.

Overarching design choices

Separation of run modes into workflows. The hmftools workflow is flexible and can be used to analyse different types of data (e.g. WGTS, targeted sequencing), which require specific configuration and reference data. There are also additional modes planned for hmftools that will eventually be implemented in oncoanalyser.

Hence, I have structured oncoanalyser so that the majority of the logic corresponding to different run modes is lifted up into workflows rather than at the subworkflow or module level. While this introduces some code duplication, much of the mode-specific logic is simplified by dividing it into different workflows providing a better maintenance and development experience. Retaining this layout will ease the addition of new run modes in the future.

Arbitrarily run individual tools or enter at any point. A key feature of oncoanalyser is the ability to begin analysis from any point in the pipeline or run just a specific set of tools. The primary use-case for this is to skip variant calling and run new/supplementary/modified downstream analyses. For example, a user can run oncoanalyser from PURPLE so long as they provide the required inputs.

Accordingly, outputs from each tool can be provided in the samplesheet and are recognised by oncoanalyser. When a user provides input for a given tool, it is retrieved in the respective subworkflow and inserted into the appropriate input channel. The feature to arbitrarily select tools to run is implemented by wrapping each tool subworkflow call in a conditional statement that checks whether the corresponding tool should be run.

Strong reference genome recommendations. Hartwig support only the reference genomes they distribute for analysis with the hmftools workflows. Hence, oncoanalyser is configured to provide users with a choice of the Hartwig GRCh37 or GRCh38 genome. While not recommended, custom reference genomes can be used (including those from iGenomes) and oncoanalyser will build all necessary indexes for the target analysis. There is additionally an option to write/publish prepared reference data to the output directory.

No egress fee reference data hosting. Given the use of Hartwig-distributed genomes and size of reference data required to run oncoanalyser, a cloud hosting provider is needed to deliver this data to users. I chose to host all default reference data (genomes, hmftools/panel resource files, and other databases) on Cloudflare R2, which is an object storage service that doesn't charge for egress. I've used Cloudflare R2 for this purpose since Jan 2023 with the nominal hosting costs being covered by our organisation (UMCCR).

Other notes

I will make PRs outside this review to address each comment/request/etc made

+ Put read_alignment and read_processing subworkflows into targeted workflow. + Uncomment the whole wgts worflow and integrate this with the read_alignment and read_processing subworkflows.

Otherwise generates a java.nio.channels.OverlappingFileLockException error

The Debian slim container didn't have ps, which is required by Nextflow

Change admonition style to fix usage docs rendering

Add known issue section to README.md

Fix display of the ORANGE report screencap on the nf-core website

Make README.md admonition compatible with website

…ation Fix conditional creation of bwa-mem2 index

Use all available threads for fastp process

mirpedrol

All local modules are good candidates to be added to nf-core/modules, this would make the modules easier to maintain, and the pipeline PR reviews easier.

conf/test_stub.config

.nf-core.yml

CITATIONS.md

assets/samplesheet.csv

docs/output.md

modules/local/amber/meta.yml

modules/local/custom/extract_tarball/main.nf

modules/local/fastp/main.nf

nextflow.config

modules/local/star/align/main.nf

modules/local/bwa-mem2/mem/main.nf

modules/local/cobalt/main.nf

- update Conda environments to use fixed Sambamba Bioconda package - new container build also uses fixed Sambamba package

Apply the first set of reviewer recommendations

Bump SvPrep to 1.2.4

Bump Sambamba to 1.0.1

Bump MarkDups 1.1.5 to Bioconda build hdfd78af_1

Bump bwa-mem2 container build

mkcmkc and others added 30 commits February 13, 2024 14:32

Reassigning TODO.

c3f9050

Add tests for remaining subworkflows

f16be0d

Emiting versions.

fe49cf0

Updating TODOs.

1d13b56

Fix tab character indent

b31d3cc

Fixing tags for new processes.

8ce78ff

Setting up targeted and wgts workflows for testing.

ee059ad

+ Put read_alignment and read_processing subworkflows into targeted workflow. + Uncomment the whole wgts worflow and integrate this with the read_alignment and read_processing subworkflows.

Adjust genomes location within local test_data dir

f120d59

Adjust genome stub placeholder paths in tests

7b703c2

Update built-in test profile

8776fe2

Remove unneeded test config and comment

94b35b5

Update test samplesheet name

2ce99f4

Minor fixes and style improvements.

e3f8d45

Bump BamTools to 1.2.1

8c79f19

Correct process Conda environment packages

779882e

Relocate Conda env files to module directory

72b6466

Otherwise generates a java.nio.channels.OverlappingFileLockException error

Set unique names for process Conda environments

f5c5d72

Fix Conda environment name typo

61c2197

Fix GRIPSS Conda package name

f0a3619

Use mulled multicontainers for custom processes

08736be

Fix Sigs Conda package name

9b8dd27

Adding a TODO.

58ed71c

Correct linxreport conda environment path

e4b27c2

Fix/switch extract process container

7e82c4e

The Debian slim container didn't have ps, which is required by Nextflow

Add has_umis switch to markdups.

de77de8

Force symlink overwrite so process does not fail on resume.

01ad68d

Change name of output bam from markdups.

df6e65a

Fix read group arg to bwa mem2.

a4a95b1

Add TODO.

61aa1b1

Fix markdups umi flags for TSO500 panel samples.

a1dde8e

scwatts and others added 11 commits May 19, 2024 08:02

Improve wording for known issue entry in README.md

52610fe

Merge pull request #39 from nf-core/fix-usage-docs-rendering

7be3dda

Change admonition style to fix usage docs rendering

Merge pull request #38 from nf-core/add-known-issues

67f8e65

Add known issue section to README.md

Fix display of ORANGE report screencap on website

b411125

Merge pull request #40 from nf-core/fix-website-image-rendering

8e65844

Fix display of the ORANGE report screencap on the nf-core website

Make README.md adminitions compatible with website

6dc8459

Merge pull request #41 from nf-core/adjust-readme-admonition

d5d64c4

Make README.md admonition compatible with website

Only create bwa-mem2 index if FASTQ inputs present

437e72e

Merge pull request #42 from nf-core/fix-conditional-bwamem2-index-cre…

a5b9846

…ation Fix conditional creation of bwa-mem2 index

Use all available threads for fastp process

d1bc96a

Merge pull request #43 from nf-core/set-fastp-threads

b0cf4ce

Use all available threads for fastp process

mirpedrol reviewed May 22, 2024

View reviewed changes

LaurenceKuhl reviewed May 27, 2024

View reviewed changes

modules/local/bwa-mem2/mem/main.nf Outdated Show resolved Hide resolved

modules/local/cobalt/main.nf Outdated Show resolved Hide resolved

scwatts and others added 17 commits May 27, 2024 17:46

Adjust test_stub.config usage comment

69f4d3c

Update .nf-core.yml lint rule skip list

56f9058

Correct typo in docs/output.md

138d900

Replace System.exit with Nextflow.exit

d925a44

Further standardise variable naming

644927c

Make output tarball channel name more descriptive

448a988

Bump MarkDups 1.1.5 to Bioconda build hdfd78af_1

84afa4d

Bump SvPrep to 1.2.4

8295f1e

Bump bwa-mem2 container build and update Conda env

610ea8a

- update Conda environments to use fixed Sambamba Bioconda package - new container build also uses fixed Sambamba package

Bump Sambamba to 1.0.1

ab9518b

Bump version to 1.0.0 for initial release

3a6ebfc

Fix usage.md MarkDups link

be33799

Merge pull request #49 from nf-core/initial-release-review-changes

c843c3b

Apply the first set of reviewer recommendations

Merge pull request #47 from nf-core/bump-svprep-version

2f33d9e

Bump SvPrep to 1.2.4

Merge pull request #46 from nf-core/bump-sambamba-version

31be68d

Bump Sambamba to 1.0.1

Merge pull request #45 from nf-core/bump-markdups-package-build

ba366c9

Bump MarkDups 1.1.5 to Bioconda build hdfd78af_1

Merge pull request #48 from nf-core/bump-bwa_mem2-container-build

c477a07

Bump bwa-mem2 container build

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial release review [DO NOT MERGE] #29

Initial release review [DO NOT MERGE] #29

scwatts commented May 13, 2024

mirpedrol left a comment

Initial release review [DO NOT MERGE] #29

Are you sure you want to change the base?

Initial release review [DO NOT MERGE] #29

Conversation

scwatts commented May 13, 2024

Background

Overarching design choices

Other notes

mirpedrol left a comment

Choose a reason for hiding this comment