Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
UMI-Reducer is a method for processing and differentiating PCR duplicates from biological duplicates and bias-free estimation of mRNA abundance in the sample.
Download UMI-Reducer using
git clone https://github.com/smangul1/UMI-Reducer.git
Install UMI-Reducer from the base directory
cd UMI-Reducer ./install.sh
Run UMI-Reducer analysis by a single command for the BAM file with mapped reads. BAM file needs to be indexed (.bai file). The format of read names is described here
Find UMI-Reducer analysis in toyExample directory. Learn more here
We also provide scripts and instructions how to prepare mapped reads. Learn more here
Use the sidebar to the right to navigate UMI-Reducer tutorial. Get started with a toy example of 568 mapped RNA-Seq reads (example/toyExample.bam) distributed with UMI-Reducer package
What is UMI-Reducer
UMI-Reducer is a method for processing and differentiating PCR duplicates from biological duplicates. UMI-Reducer uses UMIs and the mapping position of the read to identify and collapse reads that are technical duplicates. Remaining true biological reads are further used for bias-free estimate of mRNA abundance in the original lysate. This strategy is of particular use for libraries made from low amounts of starting material, which typically requires additional cycles of PCR and therefore is most prone to PCR duplicate bias.
Why do I need UMI-Reducer?
Every sequencing library contains duplicate reads. While many duplicates arise during PCR, some of these duplicates derive from multiple identical fragments of mRNA present in the original lysate (termed "biological duplicates"). Because PCR duplication is biased, the best estimate of mRNA abundance in the original lysate is obtained by collapsing PCR duplicates while keeping biological duplicates. In order differentiate PCR duplicates from biological duplicates, primers contain 7mer degenerate sequences called Unique Molecular Identifiers (UMIs), such that each primer is in fact 16,384 uniquely identifiable primers. (Note that the UMI is not the same as the barcode, which is a fixed 4 letter sequence adjacent to the UMI that identifies the parent primer.) Thus, any two mRNA fragments present in the original lysate are highly likely to be associated with unique UMIs.
Mangul, Serghei, et al. UMI-Reducer: Collapsing duplicate sequencing reads via Unique Molecular IdentifiersbioRxiv (2017): 103267.
This software was developed by Serghei Mangul. Please do not hesitate to contact me (email@example.com) if you have any comments, suggestions, or clarification requests regarding the tutorial or if you would like to contribute to this resource.