Filter BAM output of UMI-tools dedup before passing to Salmon quant #828

lars-work-sund · 2022-05-20T09:40:43Z

Description of the bug

When using UMIs salmon reports millions of:
WARNING: Detected suspicious pair --- The names are different:

This is because UMI-tools includes unpaired and chimeric reads. This leads to orphan reads.

I believe setting the flags:
--unpaired-reads=discard
and
--chimeric-pairs=discard

Should solve the issue

Command used and terminal output

No response

Relevant files

No response

System information

No response

The text was updated successfully, but these errors were encountered:

lars-work-sund · 2022-05-20T11:28:56Z

The problem seems more complicated than that. It seems to be the issue discussed here

drpatelh · 2022-05-20T11:41:26Z

Running this command as mentioned in the linked issue above seems to fix the problem:

umi_tools prepare-for-rsem -I WT_REP1.umi_dedup.transcriptome.bam --stdout=test.bam

The log files for Salmon no longer chucks out warnings for unpaired reads so I assume the reads are being handled correctly:

[2022-05-20 11:34:37.622] [jointLog] [info] setting maxHashResizeThreads to 2
[2022-05-20 11:34:37.622] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
[2022-05-20 11:34:37.622] [jointLog] [info] numQuantThreads = 2
[2022-05-20 11:34:38.114] [fileLog] [info] quantification processed 0 fragments so far

[2022-05-20 11:34:38.114] [fileLog] [info] quantification processed 53169 fragments so far

[2022-05-20 11:34:37.990] [jointLog] [warning] Transcript Gfp_transgene_gene appears in the reference but did not appear in the BAM
[2022-05-20 11:34:37.990] [jointLog] [info] replaced 0 non-ACGT nucleotides with random nucleotides
[2022-05-20 11:34:38.714] [jointLog] [info] Thread saw mini-batch with a maximum of 2.20% zero probability fragments
[2022-05-20 11:34:38.716] [jointLog] [info] Thread saw mini-batch with a maximum of 2.00% zero probability fragments
[2022-05-20 11:34:38.716] [jointLog] [info] 


Completed first pass through the alignment file.
Total # of mapped reads : 53169
# of uniquely mapped reads : 47590
# ambiguously mapped reads : 5579



[2022-05-20 11:34:38.722] [jointLog] [info] Computed 123 rich equivalence classes for further processing
[2022-05-20 11:34:38.722] [jointLog] [info] Counted 52389 total reads in the equivalence classes 
[2022-05-20 11:34:38.723] [jointLog] [warning] Only 53169 fragments were mapped, but the number of burn-in fragments was set to 5000000.
The effective lengths have been computed using the observed mappings.

[2022-05-20 11:34:38.723] [jointLog] [info] starting optimizer
[2022-05-20 11:34:38.724] [jointLog] [info] Marked 0 weighted equivalence classes as degenerate
[2022-05-20 11:34:38.724] [jointLog] [info] iteration = 0 | max rel diff. = 444.782
[2022-05-20 11:34:38.728] [jointLog] [info] iteration = 100 | max rel diff. = 1.47645e-16
[2022-05-20 11:34:38.728] [jointLog] [info] finished optimizer
[2022-05-20 11:34:38.728] [jointLog] [info] writing output
[2022-05-20 11:34:39.127] [jointLog] [info] Computing gene-level abundance estimates
[2022-05-20 11:34:39.135] [jointLog] [warning] Feature has no GFF ID
[2022-05-20 11:34:39.135] [jointLog] [info] There were 124 transcripts mapping to 124 genes
[2022-05-20 11:34:39.135] [jointLog] [info] NOTE: We recommend using tximport (https://bioconductor.org/packages/release/bioc/html/tximport.html) for aggregating transcript-level salmon abundance estimates to the gene level.  It is more versatile, exposes more features, and allows considering multi-sample information during aggregation.
[2022-05-20 11:34:39.135] [jointLog] [info] Aggregating expressions to gene level
[2022-05-20 11:34:39.136] [jointLog] [info] done

drpatelh · 2022-05-24T09:46:14Z

As a temporary solution, I have copied the prepare-for-rsem.py script directly from the UMI-tools repo into the bin/ directory of the pipeline. I have attempted to provide the appropriate credits at the top of the script. This will allow us to get a release of the pipeline out without waiting for UMI-tools and containers to be released. When this does happen we can remove the script from the pipeline and use it directly from the official package.

I have created a separate issue to track this for future releases of the pipeline: #831

Fixed in drpatelh@2e3e668 drpatelh@d5034fa

lars-work-sund added the bug Something isn't working label May 20, 2022

lars-work-sund changed the title ~~UMI-tools renaming breaks name sorting~~ UMI-tools breaks salmon quant May 20, 2022

drpatelh changed the title ~~UMI-tools breaks salmon quant~~ Filter BAM output of UMI-tools dedup before passing to Salmon quant May 23, 2022

drpatelh added a commit to drpatelh/nf-core-rnaseq that referenced this issue May 23, 2022

Fix nf-core#828

2e3e668

drpatelh added a commit to drpatelh/nf-core-rnaseq that referenced this issue May 23, 2022

Follow up fix for nf-core#828 - name sort SE BAM too

d5034fa

drpatelh added this to the 3.8 milestone May 23, 2022

drpatelh mentioned this issue May 24, 2022

Fixes for v3.8 release #830

Merged

drpatelh modified the milestones: 3.8, 3.9 May 24, 2022

drpatelh mentioned this issue May 24, 2022

Use prepare-for-rsem.py script directly from UMI-tools package #831

Closed

drpatelh modified the milestones: 3.9, 3.8 May 24, 2022

drpatelh closed this as completed May 24, 2022

MatthiasZepper mentioned this issue Jan 3, 2024

Add umicollapse and benchmark it against umitools dedup #1087

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter BAM output of UMI-tools dedup before passing to Salmon quant #828

Filter BAM output of UMI-tools dedup before passing to Salmon quant #828

lars-work-sund commented May 20, 2022 •

edited

Loading

lars-work-sund commented May 20, 2022

drpatelh commented May 20, 2022

drpatelh commented May 24, 2022 •

edited

Loading

Filter BAM output of UMI-tools dedup before passing to Salmon quant #828

Filter BAM output of UMI-tools dedup before passing to Salmon quant #828

Comments

lars-work-sund commented May 20, 2022 • edited Loading

Description of the bug

Command used and terminal output

Relevant files

System information

lars-work-sund commented May 20, 2022

drpatelh commented May 20, 2022

drpatelh commented May 24, 2022 • edited Loading

lars-work-sund commented May 20, 2022 •

edited

Loading

drpatelh commented May 24, 2022 •

edited

Loading