zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
Clone or download
Latest commit 553c23b Oct 15, 2018
Failed to load latest commit information.
ExampleData Release of zUMIs2.0 Aug 9, 2018
dependencies_source Release of zUMIs2.0 Aug 9, 2018
LICENSE Minor text changes Oct 9, 2017
README.md Update README.md Oct 15, 2018
UMIstuffFUN.R speed up UMI hamming distance filter Sep 27, 2018
_config.yml commit again Jul 24, 2017
barcodeIDFUN.R version 2.2.0: frameshift correction and other improvements Oct 14, 2018
distilReads.pm version 2.2.0: frameshift correction and other improvements Oct 14, 2018
fqfilter_v2.pl initialized isPass Oct 14, 2018
listPrefix.sh Release of zUMIs2.0 Aug 9, 2018
mergeBAM.sh remove bam merging to further speed up filtering Oct 4, 2018
readYaml4fqfilter.R version 2.2.0: frameshift correction and other improvements Oct 14, 2018
runfeatureCountFUN.R speed up sorting of bam files after featureCounts Aug 27, 2018
splitfq.sh Release of zUMIs2.0 Aug 9, 2018
statsFUN.R Release of zUMIs2.0 Aug 9, 2018
zUMIs-config_shiny.R fix if pattern detection not in read1 Sep 14, 2018
zUMIs-dge2.R version 2.2.0: frameshift correction and other improvements Oct 14, 2018
zUMIs-mapping.R version 2.2.0: frameshift correction and other improvements Oct 14, 2018
zUMIs-master.sh version 2.2.0: frameshift correction and other improvements Oct 14, 2018
zUMIs-stats2.R version 2.2.0: frameshift correction and other improvements Oct 14, 2018
zUMIs.png commit again Jul 24, 2017
zUMIs.yaml Release of zUMIs2.0 Aug 9, 2018
zUMIs_GI2017_poster.pdf implement new Rsubread and hamming dists Feb 18, 2018


Welcome to zUMIs 🔧 🚗💨 🔧

zUMIs is a fast and flexible pipeline to process RNA-seq data with (or without) UMIs.

The input to this pipeline is simply fastq files. In the most common cases, you will have a read containing the cDNA sequence and other read(s) containing UMI and Cell Barcode information. Furthermore, you will need a STAR index for your genome and GTF annotation file.

You can read more about zUMIs in our paper!

zUMIs2.0 released!

We have completely rewritten zUMIs with a boatload of improvements! Today we finally release this version for general use. For all existing & new zUMIs users, we would really appreciate if you get in touch with us and give us some feedback! Here are some of the new features:

  • Setup of all parameters in a convenient YAML config file. This will allow better reproducibility and parameter tracking. You can create the YAML config file using an easy to use Rshiny application.
  • User-definable memory limit: zUMIs calculates expression matrices for cell barcodes within a given amount of RAM. For this, cell barcodes are grouped according to the maximum number of reads that may be processed without exceeding the memory limit.
  • Much increased processing speed! For our published test data set of 96 HEK cells, zUMIs2.0 is more than 2x faster. To achieve this, we have parallelized the filtering step as well as rewritten the UMI collapsing scripts. zUMIs2 speed
  • More convenient & flexible handling of barcodes, UMIs and cDNA sequences that eliminates protocol-specific settings or preprocessing scripts. You can use zUMIs now with up to 4 fastq input files, ie. paired-end dual-index Illumina data!
  • Compatibility with non-UMI protocols, such as Smart-seq2. You can simply run zUMIs with multiplexed Smart-seq2 data and will obtain per-cell read counts.
  • Compatibility with paired-end cDNA reads in combination with cell barcodes and UMIs.
  • Possibility to integrate transgenes or external references like ERCC spike ins on the fly. Simply add the path to an additional fasta file and zUMIs will add it to the reference genome and produce summary stats separately from endogenous mRNA for these.
  • Pattern recognition: zUMIs can find a sequence pattern in the input reads and retain only those with their matched barcodes & UMIs for further analysis.

The previous implementation of zUMIs has moved to an archive branch in GitHub and is no longer being updated. You can also find other older versions of zUMIs here.


14 Oct 2018: zUMIs2.2 released. Since the big update of zUMIs2, we have continuously improved performance and fixed bugs. Additionally, we have implemented a barcode-frameshift correction, which helps for ddSeq/Surecell data.

09 Aug 2018: zUMIs2.0 released. For a detailed list of changes check above and in the updated wiki.

12 Apr 2018: zUMIs.0.0.6 released. Improved support for combinatorial indexing methods.

30 Mar 2018: zUMIs.0.0.5 released. Rewrote hamming distance binning of UMIs and barcodes. In addition to faster running times, removed dependency on the stringdist package that may have led to issues with parallel computing in some systems. Furthermore removed a possible bug when resuming running with the -w switch in combination with plate barcode usage.

23 Feb 2018: zUMIs.0.0.4 released. Added support for plate barcodes with input of an additional barcode fastq file (eg. Illumina i7 index read). Addition of version number in zUMIs-master. Parameters are printed in a .zUMIs_run.txt file for each call.

18 Feb 2018: zUMIs.0.0.3 released. Switched support to the new Rsubread version and data format. Furthermore to compensate sequencing/PCR errors, zUMIs now features UMI correction using Hamming distance and binning of adjacent cell barcodes.

Installation and Usage

Please find information on installation and usage in the zUMIs wiki.

Please make sure to use the same or higher versions of dependencies as mentioned.


zUMIs is compatible with nearly all (single-cell) RNA-seq protocols! Compatibility includes these single-cell UMI protocols:

  • CEL-seq with UMI (Grün et al., 2014)
  • SCRB-seq (Soumillon et al., 2014)
  • MARS-seq (Jaitin et al., 2014)
  • STRT-C1 (Islam et al., 2014)
  • Drop-seq (Macosko et al., 2015)
  • CEL-seq2 (Hashimshony et al., 2016)
  • SORT-seq (Muraro et al., 2016)
  • DroNc-seq (Habib et al., 2017)
  • Seq-Well (Gierahn et al., 2017)
  • SPLiT-seq (Rosenberg et al., 2018)
  • sci-RNA-seq (Cao et al., 2017)
  • STRT-2i (Hochgerner et al., 2018)
  • Quartz-seq2 (Sasagawa et al., 2017)
  • 10x Genomics Chromium (Zheng et al., 2017)
  • Wafergen ICELL8 (Gao et al., 2017)
  • Illumina ddSEQ SureCell
  • inDrops (Zilionis et al., 2017; Klein et al. 2015)
  • mcSCRB-seq (Bagnoli et al., 2018)

zUMIs is now also compatible with non-UMI single-cell protocols:

  • CEL-seq (Hashimshony et al., 2012)
  • Smart-seq (Ramskold et al., 2012)
  • Smart-seq2 (Picelli et al., 2013)

If you do not find your (favorite) scRNA-seq protocol on the list, get in touch with us!

Getting help

Refer to zUMIs Github wiki for help.

Feel free to contact us on Twitter @swatidparekh and @chris_zie with comments or questions!

Please report bugs 🐞🐛 to the zUMIs Github issue page

If you encounter issues when using zUMIs for the first time, please try to run the example data set included in this repository.