Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul strandedness detection / comparison #1306

Merged
merged 43 commits into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
0cfb662
Check RSeQC strandedness without reference to undetermined
pinin4fjords May 29, 2024
5ce89ae
Update subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf
pinin4fjords May 29, 2024
26a329b
fix strand message
pinin4fjords Jun 12, 2024
7a20206
Merge branch 'dev' into improve_rseqc_strandedness
pinin4fjords Jun 13, 2024
4c4afd0
Update Salmon
pinin4fjords Jun 14, 2024
2a5414c
Add strandedness detection threshold parameter
pinin4fjords Jun 14, 2024
8ccffc9
Add a consistent library type check between Salmon and RSeQC. Make th…
pinin4fjords Jun 14, 2024
c6b91a9
Amend for undetermined
pinin4fjords Jun 14, 2024
ab0c62b
Constraint strand detection threshold
pinin4fjords Jun 14, 2024
38bdb9e
Consistent defaults
pinin4fjords Jun 14, 2024
cb54dc2
Merge branch 'dev' into improve_rseqc_strandedness
pinin4fjords Jun 17, 2024
9aadd4a
Fix up after merge
pinin4fjords Jun 17, 2024
82724b8
Fix for linting
pinin4fjords Jun 17, 2024
c4d3535
update CHANGELOG
pinin4fjords Jun 17, 2024
9dd6d6d
Fix the fix
pinin4fjords Jun 17, 2024
451cb64
fix typos
pinin4fjords Jun 17, 2024
3df6ec8
[automated] Fix linting with Prettier
nf-core-bot Jun 17, 2024
007bde2
Fix module lint error
pinin4fjords Jun 17, 2024
002ebbd
revert config changed in error
pinin4fjords Jun 17, 2024
465be1f
Merge branch 'improve_rseqc_strandedness' of github.com:nf-core/rnase…
pinin4fjords Jun 17, 2024
1977f6d
Fix conditionality
pinin4fjords Jun 17, 2024
bc6189f
Clarify stranded/ unstraded at library level, include other library t…
pinin4fjords Jun 18, 2024
62ed423
Allow for missing keys in salmon library counts for testing
pinin4fjords Jun 18, 2024
b4760ce
Fix function tests
pinin4fjords Jun 18, 2024
2ded8c0
Auto updating snapshot broke things for unaffected tests. So manually…
pinin4fjords Jun 18, 2024
3733031
Merge branch 'dev' into improve_rseqc_strandedness
pinin4fjords Jun 18, 2024
231fba6
Merge branch 'dev' into improve_rseqc_strandedness
pinin4fjords Jun 18, 2024
29265f0
Merge branch 'dev' into improve_rseqc_strandedness
pinin4fjords Jun 18, 2024
ac9abda
Return percentages for strandedness
pinin4fjords Jun 18, 2024
3ad69a7
Explicitly publish lib_format_counts
pinin4fjords Jun 18, 2024
4ce478d
update docs
pinin4fjords Jun 18, 2024
3583d2c
Fix typo
pinin4fjords Jun 18, 2024
eef388e
Prettier
pinin4fjords Jun 18, 2024
c95c489
Fix typo
pinin4fjords Jun 18, 2024
035e85f
Set maximum on bars per column
pinin4fjords Jun 19, 2024
cdbc971
Pass/fail in separate mqc column
pinin4fjords Jun 19, 2024
0b16071
Merge branch 'dev' into improve_rseqc_strandedness
pinin4fjords Jun 19, 2024
6eb5b06
Merge branch 'improve_rseqc_strandedness' of github.com:nf-core/rnase…
pinin4fjords Jun 19, 2024
dc22231
Fix multiqc config for cell colors to highlight strandedness
pinin4fjords Jun 19, 2024
bd48aa7
Fix strand status conditional
pinin4fjords Jun 19, 2024
36c6658
update strand check image
pinin4fjords Jun 19, 2024
e45e521
Fix multiqc yaml
pinin4fjords Jun 19, 2024
4dd03ca
fix up params + usage
pinin4fjords Jun 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements
- [PR #1310](https://github.com/nf-core/rnaseq/pull/1310) - Reinstate pseudoalignment subworkflow config
- [PR #1309](https://github.com/nf-core/rnaseq/pull/1309) - Document FASTP sampling
- [PR #1312](https://github.com/nf-core/rnaseq/pull/1312) - Fix issues with unzipping of GTF/ GFF files without absolute paths
- [PR #1306](https://github.com/nf-core/rnaseq/pull/1306) - Overhaul strandedness detection / comparison

### Parameters

Expand Down
Binary file modified docs/images/mqc_strand_check.png
100755 → 100644
pinin4fjords marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,7 @@ The majority of RSeQC scripts generate output files which can be plotted and sum

</details>

This script predicts the "strandedness" of the protocol (i.e. unstranded, sense or antisense) that was used to prepare the sample for sequencing by assessing the orientation in which aligned reads overlay gene features in the reference genome. The strandedness of each sample has to be provided to the pipeline in the input samplesheet (see [usage docs](https://nf-co.re/rnaseq/usage#samplesheet-input)). However, this information is not always available, especially for public datasets. As a result, additional features have been incorporated into this pipeline to auto-detect whether you have provided the correct information in the samplesheet, and if this is not the case then a warning table will be placed at the top of the MultiQC report highlighting the offending samples (see image below). If required, this will allow you to correct the input samplesheet and rerun the pipeline with the accurate strand information. Note, it is important to get this information right because it can affect the final results.
This script predicts the "strandedness" of the protocol (i.e. unstranded, sense or antisense) that was used to prepare the sample for sequencing by assessing the orientation in which aligned reads overlay gene features in the reference genome. The strandedness of each sample has to be provided to the pipeline in the input samplesheet (see [usage docs](https://nf-co.re/rnaseq/usage#samplesheet-input)). However, this information is not always available, especially for public datasets. As a result, additional features have been incorporated into this pipeline to auto-detect whether you have provided the correct information in the samplesheet, and if this is not the case then the affected libraries will be flagged in the tale under 'Strandedness Checks' elsewhere in the report. If required, this will allow you to correct the input samplesheet and rerun the pipeline with the accurate strand information. Note, it is important to get this information right because it can affect the final results.
pinin4fjords marked this conversation as resolved.
Show resolved Hide resolved

RSeQC documentation: [infer_experiment.py](http://rseqc.sourceforge.net/#infer-experiment-py)

Expand Down
6 changes: 5 additions & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ You will need to create a samplesheet with information about the samples you wou

### Multiple runs of the same sample

The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes. If you set the strandedness value to `auto` the pipeline will sub-sample the input FastQ files to 1 million reads, use Salmon Quant to infer the strandedness automatically and then propagate this information to the remainder of the pipeline. If the strandedness has been inferred or provided incorrectly a warning will be present at the top of the MultiQC report so please be sure to check when looking at the QC for your samples.
The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes.

```csv title="samplesheet.csv"
sample,fastq_1,fastq_2,strandedness
Expand All @@ -27,6 +27,10 @@ CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,a
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,auto
```

### Strandedness prediction

If you set the strandedness value to `auto` the pipeline will sub-sample the input FastQ files to 1 million reads, use Salmon Quant to infer the strandedness automatically and then propagate this information to the remainder of the pipeline. If the strandedness has been inferred or provided incorrectly the sample will be flagged in the 'Strandedness Checks' section of the MultiQC report, so please be sure to check when looking at the QC for your samples.

### Full samplesheet

The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 4 columns to match those defined in the table below.
Expand Down
6 changes: 3 additions & 3 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -162,12 +162,12 @@
},
"salmon/index": {
"branch": "master",
"git_sha": "ffc101e1b84ef3df2e4e4a966e84b3c513ae5693",
"git_sha": "cb6b2b94fc40dea58f0b1e3dd095f3dd24f2ac8a",
"installed_by": ["fastq_subsample_fq_salmon"]
},
"salmon/quant": {
"branch": "master",
"git_sha": "cb6b2b94fc40dea58f0b1e3dd095f3dd24f2ac8a",
"git_sha": "727232afb8294b53dd9d05bfe469b70cce1675bb",
"installed_by": ["fastq_subsample_fq_salmon", "modules", "quantify_pseudo_alignment"]
},
"samtools/flagstat": {
Expand Down Expand Up @@ -324,7 +324,7 @@
},
"fastq_subsample_fq_salmon": {
"branch": "master",
"git_sha": "003920c7f9a8ae19b69a97171922880220bedf56",
"git_sha": "727232afb8294b53dd9d05bfe469b70cce1675bb",
"installed_by": ["subworkflows"]
},
"quantify_pseudo_alignment": {
Expand Down
25 changes: 25 additions & 0 deletions modules/nf-core/salmon/index/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

28 changes: 28 additions & 0 deletions modules/nf-core/salmon/index/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions modules/nf-core/salmon/index/tests/main.nf.test.snap

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 8 additions & 3 deletions modules/nf-core/salmon/quant/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

40 changes: 32 additions & 8 deletions modules/nf-core/salmon/quant/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading