Skip to content
Serghei Mangul edited this page Jan 30, 2018 · 21 revisions

Quick Start

UMI-Reducer is a method for processing and differentiating PCR duplicates from biological duplicates and bias-free estimation of mRNA abundance in the sample.

Download UMI-Reducer using

git clone

Install UMI-Reducer from the base directory

cd UMI-Reducer

Run UMI-Reducer analysis by a single command for the BAM file with mapped reads. BAM file needs to be indexed (.bai file). The format of read names is described here

./ example/toy.example.bam 

Find UMI-Reducer analysis in toyExample directory. Learn more here

We also provide scripts and instructions how to prepare mapped reads. Learn more here

UMI-Reducer Tutorial

Use the sidebar to the right to navigate UMI-Reducer tutorial. Get started with a toy example of 568 mapped RNA-Seq reads (example/toyExample.bam) distributed with UMI-Reducer package

What is UMI-Reducer

UMI-Reducer is a method for processing and differentiating PCR duplicates from biological duplicates. UMI-Reducer uses UMIs and the mapping position of the read to identify and collapse reads that are technical duplicates. Remaining true biological reads are further used for bias-free estimate of mRNA abundance in the original lysate. This strategy is of particular use for libraries made from low amounts of starting material, which typically requires additional cycles of PCR and therefore is most prone to PCR duplicate bias.

Why do I need UMI-Reducer?

Every sequencing library contains duplicate reads. While many duplicates arise during PCR, some of these duplicates derive from multiple identical fragments of mRNA present in the original lysate (termed "biological duplicates"). Because PCR duplication is biased, the best estimate of mRNA abundance in the original lysate is obtained by collapsing PCR duplicates while keeping biological duplicates. In order differentiate PCR duplicates from biological duplicates, primers contain 7mer degenerate sequences called Unique Molecular Identifiers (UMIs), such that each primer is in fact 16,384 uniquely identifiable primers. (Note that the UMI is not the same as the barcode, which is a fixed 4 letter sequence adjacent to the UMI that identifies the parent primer.) Thus, any two mRNA fragments present in the original lysate are highly likely to be associated with unique UMIs.


Mangul, Serghei, et al. UMI-Reducer: Collapsing duplicate sequencing reads via Unique Molecular IdentifiersbioRxiv (2017): 103267.



This software was developed by Serghei Mangul. Please do not hesitate to contact me ( if you have any comments, suggestions, or clarification requests regarding the tutorial or if you would like to contribute to this resource.

You can’t perform that action at this time.