Cell-specific methylation pattern reconstruction. Currently uses an LP formulation and solver. The lemon graph library and the glpk LP solver are included in this repository. The ezOptionParser.hpp is also included

The algorithm is described and tested in this publication:


This project uses cmake for building and requires at least version 2.6. It also uses c++11 so use a compiler that supports this (e.g., g++ >= 4.7 or clang >= 3.4)

$ git clone
$ cd methylFlow
$ git submodule init
$ git submodule update
$ mkdir build && cd build
$ cmake ..
$ make
$ make install

To compile with DEBUG flags use

$ mkdir build_devel && cd build_devel
$ cmake -DCMAKE_BUILD_TYPE=Debug ..
$ make


MethylFlow: methylation pattern reconstruction

USAGE: methylFlow -sam -i reads.sam -o mfoutput [OPTIONS]


-chr, -Chr ARG                    chr name for tsv files, not required for sam
                                  input file

-cpgloss, -p, -P, --cpgloss       Use cpg-loss instead of region-loss.

-e, -eps, -E, --eps ARG           Regularization parameter search threshold.

-end, -End, --end ARG             Only process reads aligning before given

-h, -help, --help, --usage        Display usage instructions.

-i, -in, --in, --input ARG        Read input file. Default:Tab-separated format:
                                  start length strand

-l, -lam, -lambda, --lambda ARG   Regularization parameter value.

-o, -out, --out, --output ARG     Output directory. Directory must exist before
                                  running. Files written: cpgs.tsv, components.tsv,
                                  patterns.tsv, regions.tsv

-s, -scale, -S, --scale ARG       Scale parameter value.

-sam, -SAM, --sam                 Input file is in SAM format instead of default
                                  tab-separated format.

-start, -Start, --start ARG       Only process reads aligning after given

-v, -verbose, -V, --verbose       Verbose option.


methylFlow -sam -i reads.sam -o mfoutput -l 10.0 -s 30.0 -e 0.1


Upon running, the output directory (mfoutput in the example above) will contain three files with the following format:


Tab-separated file of coverage and methylation calls per cpg. Columns

  • chr: chromosome name
  • pos: cpg position
  • Cov: number of reads overlapping CpG
  • Meth: number of reads indicating CpG is methylated


Tab-separated file of components found by algorithm. A component is a connected region graph based on overlapping reads. Genomic regions are covered by a single component, thus, cell-specific patterns estimated in a given genomic region are obtained from (one or morei non-overlapping) components that overlap that region.


  • chr: chromosome name of genomic region covered by connected component
  • start: starting position of genomic region covered by connected component
  • end: ending position of genomic region covered by connected component
  • cid: component id, identifier given to component, used to connect to regions and patterns in other output files
  • npatterns: number of cell-specific methylation patterns estimated from this connected component.
  • total_coverage: total number of reads overlapping this component's genomic region
  • total_flow: the sum of all estimated abundances (flows) for patterns in this region


Tab-separated file of cell-specific methylation patterns estimated by methylFlow.


  • chr: chromosome name
  • start: start position of pattern
  • end: end position of pattern
  • cid: component id, corresponds to id of a component in file components.tsv
  • pid: pattern id, identifier given to pattern (unique across patterns within the same component)
  • abundance: abundance estimated for this pattern
  • methylpat: comma-separated list of methylation status entries of cpgs within pattern. Entries are pos:[M|U] where position is the location of the CpG from the start of the pattern and M|U indicates if the CpG is methylated or unmethylated respectively
  • regions: comma-separated list of regions included in pattern (see file regions.tsv)


Tab-separated file of regions that make up the region graph used in the estimation algorithm. Reads are assigned to a region if they have no disagreement on their methylation pattern. That is, regions contain the longest stretches of overlapping reads with unambiguous methylation patterns.


  • chr: chromosome name
  • start: start position of region
  • end: end position of region
  • cid: component id, corresponds to identified of component in file components.tsv
  • rid: region id, identifier given to region (unique across regions within the same component)
  • raw_coverage: number of reads assigned to the region
  • norm_coverage: normalized region coverage
  • exp_coverage: the sum of abundances of all patterns that include this region
  • methylpat: methylation pattern of region, given in same format as patterns.tsv


Hector Corrada Bravo
Faezeh Dorri

Center for Bioinformatics and Computational Biology
University of Maryland


