umi_tools dedup : Run before salmon to dedup counts #576

jryge · 2021-03-05T09:10:18Z

No description provided.

jryge · 2021-03-05T12:47:03Z

Additional description of issue
I have single end reads, where the UMIs are part of the index. With bcl2fastq I get a fastq file with the reads and one with the UMIs. To make it compatible with umi-tools in the rnaseq pipeline I added the UMI sequences to the beginning of the reads. This seems to work, the pipeline completes and umi_tools extracts and dedups the the reads (though "umi_tools dedup ... *.bam" takes a VERY long time ~24h). The issue for me is that the quantification of the reads with salmon seems to be done on the original bam files (from the STAR alignment) and not the de-duplicated ones. I compared the gene counts to a run on the same data without activating the umi part, and the they are practically identical (apart from some occasional minor rounding errors)...

Solution
It seems like the salmon read count quantification is done on the star alignments prior to deduplication, while is should be done on the de-deplicated bam file (with umi_tools desup). A "simple" reorder the workflow should do the trick.

drpatelh · 2021-04-14T11:59:19Z

This turned into quite a big job 😅 Salmon wants a BAM sorted by read name and umitools needs a BAM sorted by co-ordinate with an index. STAR produces the former which was handy to plug directly into Salmon. If we want to use umitools I am going to have to co-ordinate sort the transcriptome BAM from STAR, index, run umitools and then sort it again by name before running Salmon! Will mean more intermediate BAM files when using UMIs but no way around it I'm afraid 😏

Salmon takes the transcriptome BAM to perform the quantification so ideally we need to umi dedup that BAM file before the counting. However, the genome BAM is used by most downstream steps for the QC so we have to UMI dedup both BAMs separately.

drpatelh · 2021-04-14T20:09:10Z

Fixed in #593

Tricky to test this because I don't have any UMI data @jryge but be great if you can make sure that the UMI's are being de-duplicated as expected when passed through the various steps in the pipeline.

jryge · 2021-04-15T13:13:05Z

Great, I'll give a spin to see if it works. These seemingly trivial issues often turns out more completed due to all the dependencies of the different tools...

Thanks for taking the time!

drpatelh · 2021-04-15T15:44:50Z

Awesome. Thanks!

jryge added the bug Something isn't working label Mar 5, 2021

drpatelh added this to the 3.1 milestone Apr 11, 2021

drpatelh mentioned this issue Apr 14, 2021

Deduplicate UMI before read counting #593

Merged

drpatelh closed this as completed Apr 14, 2021

hendrikweisser mentioned this issue Dec 13, 2022

Add UMI deduplication before quantification in tube map #906

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

umi_tools dedup : Run before salmon to dedup counts #576

umi_tools dedup : Run before salmon to dedup counts #576

jryge commented Mar 5, 2021

jryge commented Mar 5, 2021

drpatelh commented Apr 14, 2021

drpatelh commented Apr 14, 2021

jryge commented Apr 15, 2021

drpatelh commented Apr 15, 2021

umi_tools dedup : Run before salmon to dedup counts #576

umi_tools dedup : Run before salmon to dedup counts #576

Comments

jryge commented Mar 5, 2021

jryge commented Mar 5, 2021

drpatelh commented Apr 14, 2021

drpatelh commented Apr 14, 2021

jryge commented Apr 15, 2021

drpatelh commented Apr 15, 2021