Skip to content

Commit

Permalink
Merge pull request #310 from nf-core/fix-umidedup-fastq
Browse files Browse the repository at this point in the history
Changelog++, umicollapse update
  • Loading branch information
apeltzer committed Jan 30, 2024
2 parents 6409c7c + 9cdb1c8 commit cebd369
Show file tree
Hide file tree
Showing 24 changed files with 137 additions and 422 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.md
Expand Up @@ -3,13 +3,15 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v2.3.0 - 2024-01-25 - Gray Zinc Dalmatian
## v2.3.0 - 2024-01-31 - Gray Zinc Dalmatian

- [[#307]](https://github.com/nf-core/smrnaseq/pull/307) - Clean up config file and improve output folder structure
- [[#299]](https://github.com/nf-core/smrnaseq/issues/299) - Bugfix for missing inputs in BAM stats (`genome_quant.r`)
- [[#164]](https://github.com/nf-core/smrnaseq/pull/164) - UMI Handling Feature implemented in the pipeline
- [[#302]](https://github.com/nf-core/smrnaseq/pull/302) - Merged in nf-core template v2.11.1
- [[#294]](https://github.com/nf-core/smrnaseq/pull/294) - Fixed contamination screening issues
- [[#309]](https://github.com/nf-core/smrnaseq/pull/309) - Merged in nf-core template v2.12.0
- [[#310]](https://github.com/nf-core/smrnaseq/pull/310) - Removed unnecessarily separate mirtrace subworkflow, now using module instead

### Parameters

Expand Down
64 changes: 3 additions & 61 deletions conf/modules.config
Expand Up @@ -126,77 +126,19 @@ process {
//
// UMI deduplication
//
withName: '.*:DEDUPLICATE_UMIS:UMI_MAP_GENOME' {
publishDir = [
path: { "${params.outdir}/umi_dedup/bam_mapped" },
mode: params.publish_dir_mode,
pattern: '*.bam',
enabled: (
params.save_umi_intermeds
)
]
}
withName: '.*:DEDUPLICATE_UMIS:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_SORT' {
ext.prefix = { "${meta.id}.sorted" }
publishDir = [
path: { "${params.outdir}/umi_dedup/bam_mapped_sorted" },
mode: params.publish_dir_mode,
pattern: '*.{bam}',
enabled: (
params.save_umi_intermeds
)
]
}
withName: '.*:DEDUPLICATE_UMIS:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_INDEX' {
ext.prefix = { "${meta.id}.sorted" }
publishDir = [
path: { "${params.outdir}/umi_dedup/bam_mapped_sorted" },
mode: params.publish_dir_mode,
pattern: '*.{bai,csi}',
enabled: (
params.save_umi_intermeds
)
]
}
withName: '.*:DEDUPLICATE_UMIS:BAM_SORT_STATS_SAMTOOLS:.*' {
publishDir = [
path: { "${params.outdir}/umi_dedup/bam_mapped_sorted" },
mode: params.publish_dir_mode,
pattern: '*.{stats,flagstat,idxstats}'
]
}
withName: '.*:DEDUPLICATE_UMIS:UMICOLLAPSE' {

withName: '.*:UMICOLLAPSE_FASTQ' {
ext.args = { meta.single_end ? "--algo ${params.umitools_method} --two-pass" : "--method ${params.umitools_method} --two-pass --paired --remove-unpaired --remove-chimeric" }
ext.prefix = { "${meta.id}.umi_dedup.sorted" }
publishDir = [
path: { "${params.outdir}/umi_dedup/bam_deduplicated" },
mode: params.publish_dir_mode,
pattern: '*.bam',
pattern: '*.{bam,fastq.gz}',
enabled: (
params.save_umi_intermeds
)
]
}
withName: '.*:DEDUPLICATE_UMIS:SAMTOOLS_BAM2FQ' {
publishDir = [
path: { "${params.outdir}/umi_dedup/fastq_deduplicated" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
enabled: (
params.save_umi_intermeds
)
]
}
withName: '.*:DEDUPLICATE_UMIS:FASTQC_DEDUPLICATED' {
//the prefix is required for multiqc to pickup the files separately from the other fastqc instances
ext.prefix = { "${meta.id}.deduplicated" }
ext.args = '--quiet'
publishDir = [
path: { "${params.outdir}/fastqc/deduplicated" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

//
// MIRTRACE QC
Expand Down
13 changes: 6 additions & 7 deletions docs/output.md
Expand Up @@ -14,7 +14,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

- [FastQC](#fastqc) - read quality control
- [UMI-tools extract](#umi-tools-extract) - UMI barcode extraction
- [UMI-tools deduplicate](#umi-tools-deduplicate) - read deduplication
- [UMI-collapse deduplicate](#umicollapse-deduplicate) - read deduplication
- [FastP](#fastp) - adapter trimming
- [Bowtie2](#bowtie2) - contamination filtering
- [Bowtie](#bowtie) - alignment against mature miRNAs and miRNA precursors (hairpins)
Expand Down Expand Up @@ -53,7 +53,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

</details>

[UMI-tools](https://github.com/CGATOxford/UMI-tools) deduplicates reads based on unique molecular identifiers (UMIs) to address PCR-bias. Firstly, the UMI-tools `extract` command removes the UMI barcode information from the read sequence and adds it to the read name. Secondly, reads are deduplicated based on UMI identifier after mapping as highlighted in the [UMI-tools deduplicate](#umi-tools-deduplicate) section.
[UMI-tools](https://github.com/CGATOxford/UMI-tools) extracts UMIs from reads based on unique molecular identifiers (UMIs) to address PCR-bias. Firstly, the UMI-tools `extract` command removes the UMI barcode information from the read sequence and adds it to the read name. Secondly, reads are deduplicated based on UMI identifier after mapping as highlighted in the [UMI-collapse deduplicate](#umicollapse-deduplicate) section.

To facilitate processing of input data which has the UMI barcode already embedded in the read name from the start, `--skip_umi_extract` can be specified in conjunction with `--with_umi`.

Expand All @@ -72,18 +72,17 @@ Contains FastQ files with quality and adapter trimmed reads for each sample, alo

FastP can automatically detect adapter sequences when not specified directly by the user - the pipeline also comes with a feature and a supplied miRNA adapters file to ensure adapters auto-detected are more accurate. If there are needs to add more known miRNA adapters to this list, please open a pull request.

## UMI-tools deduplicate
## UMI-collapse deduplicate

<details markdown="1">
<summary>Output files</summary>

- `umi_dedup/`
- `*.tsv`: Results statistics files detailing the UMI deduplication results.
- `*.bam`: If `--save_umi_intermeds` is specified, the deduplicated bam files **after** UMI deduplication will be placed in this directory. In addition the sorted and indexed files will be placed there as well.
- `samtools_stats/` - `*.{stats,flagstat,idxstats}:` Statistics on the mappings underlying the UMI deduplication.
- `*.log`: Results statistics files detailing the UMI deduplication results.
- `*.fastq.gz`: If `--save_umi_intermeds` is specified, the deduplicated fastq.gz files **after** UMI deduplication will be placed in this directory.
</details>

[UMI-tools](https://github.com/CGATOxford/UMI-tools) deduplicates reads based on unique molecular identifiers (UMIs) to address PCR-bias. Firstly, the UMI-tools `extract` command removes the UMI barcode information from the read sequence and adds it to the read name as highlighted in the [UMI-tools extract](#umi-tools-extract) section. The reads are deduplicated based on an alignment against the full genome of the species. The deduplicated reads are then converted into fastq format. The resulting fastq files are used in the remaining steps of the pipeline.
[UMI-tools](https://github.com/CGATOxford/UMI-tools) deduplicates reads based on unique molecular identifiers (UMIs) to address PCR-bias. Firstly, the UMI-tools `extract` command removes the UMI barcode information from the read sequence and adds it to the read name as highlighted in the [UMI-tools extract](#umi-tools-extract) section. Umicollapse works directly on the fastq files instead of mapping the UMI data first, then deduplicating and generating fastq files again.

## Bowtie2

Expand Down
12 changes: 12 additions & 0 deletions docs/usage.md
Expand Up @@ -54,6 +54,18 @@ Contamination filtering of the sequencing reads is optional and can be invoked u
- `pirna`: Used to supply a FASTA file containing piRNA contamination sequence. e.g. The FASTA file is first compared to the available miRNA sequences and overlaps are removed.
- `other_contamination`: Used to supply an additional filtering set. The FASTA file is first compared to the available miRNA sequences and overlaps are removed.

### UMI handling

The pipeline handles UMIs with two tools `Umitools-extract` and subsequently `Umicollapse` to deduplicate using UMI information. This can be achieved by using the parameters for UMI handling:

```bash
--with_umi --umitools_bc_pattern = '.+AACTGTAGGCACCATCAAT{s<=2}(?P<umi_1>.{12})(?P<discard_2>.*)'
```

:::note
You will have to specify custom `umitools_bc_pattern` patterns if your UMI is different. Please check the required capability in your UMI handling manual.
:::

## Samplesheet input

You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 2 columns ("sample" and "fastq_1"), and a header row as shown in the examples below.
Expand Down
2 changes: 0 additions & 2 deletions lib/WorkflowSmrnaseq.groovy
Expand Up @@ -48,8 +48,6 @@ class WorkflowSmrnaseq {

public static String toolCitationText(params) {

// TODO nf-core: Optionally add in-text citation tools to this list.

// Can use ternary operators to dynamically construct based conditions, e.g. params["run_xyz"] ? "Tool (Foo et al. 2023)" : "",
// Uncomment function in methodsDescriptionText to render in MultiQC report
def citation_text = [
Expand Down
7 changes: 1 addition & 6 deletions modules.json
Expand Up @@ -35,11 +35,6 @@
"git_sha": "8ec825f465b9c17f9d83000022995b4f7de6fe93",
"installed_by": ["modules"]
},
"samtools/bam2fq": {
"branch": "master",
"git_sha": "a64788f5ad388f1d2ac5bd5f1f3f8fc81476148c",
"installed_by": ["modules"]
},
"samtools/flagstat": {
"branch": "master",
"git_sha": "a64788f5ad388f1d2ac5bd5f1f3f8fc81476148c",
Expand Down Expand Up @@ -67,7 +62,7 @@
},
"umicollapse": {
"branch": "master",
"git_sha": "b573d74ce7eced7963d00d4a7f99db9caed61d79",
"git_sha": "6971511e34fb6563a48f1bf583238a7c49654910",
"installed_by": ["modules"]
},
"umitools/extract": {
Expand Down
7 changes: 0 additions & 7 deletions modules/nf-core/samtools/bam2fq/environment.yml

This file was deleted.

56 changes: 0 additions & 56 deletions modules/nf-core/samtools/bam2fq/main.nf

This file was deleted.

51 changes: 0 additions & 51 deletions modules/nf-core/samtools/bam2fq/meta.yml

This file was deleted.

71 changes: 0 additions & 71 deletions modules/nf-core/samtools/bam2fq/tests/main.nf.test

This file was deleted.

0 comments on commit cebd369

Please sign in to comment.