Skip to content
A collection of programs for identifying, manipulating, and separating phase sets.
C++ Python CMake Shell
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
samples allow for old samtools that needs -S Dec 2, 2015
src fix: include order, allow htslib to be found elsewhere Nov 27, 2015
.gitignore cmake build system Jun 15, 2015
.gitmodules removed htslib as submodule Nov 26, 2015 renamed to phase-tools Nov 27, 2015

Phase Tools

This is a collection of programs for identifying and manipulating phase sets, and for separating phased mappings.


The prerequisites for installing this collection are:

  • A C++11 toolchain. Recent versions of gcc and clang can be used.
  • cmake version at least 2.8.12 for configuration.
  • zlib.
  • boost header-only libraries, version at least 1.57.0 (optional). If available in a nonstandard location, that can be specified with -DBOOST_ROOT=/some/path argument to cmake. If not available, the libraries can be downloaded and built with -DBUILD_BOOST=1 (note- this requires about 700MB of disk space).
  • htslib library (optional). If available in a nonstandard location, that can be specified with -DHTSLIB_ROOT=/some/path. If not available, the library can be downloaded and built with -DBUILD_HTSLIB=1.

To configure, build, and install the programs in this suite, use:

# get the git repository along with submodules
cd /some/path
git clone --recursive
cd phase-tools
mkdir build && cd build
cmake ../src [ -DHTSLIB_ROOT=/path/to/htslib | -DBUILD_HTSLIB=1 ] [ -DBOOST_ROOT=/path/to/boost | -DBUILD_BOOST=1 ]
make install
/some/path/phase-tools/build/bin/bam-phase-split --version

By default, the programs are installed with the same prefix as the build directory. Effectively, this means they will be available in the bin subdirectory of the build directory. To specify a custom install directory, use -DCMAKE_INSTALL_PREFIX=/some/prefix. In this case, the programs will be installed in /some/prefix/bin.


The following programs are part of this collection. For detailed command-line parameters, use --help.


Given a VCF file containing genotype calls from a trio of individuals, phase the calls in the child using the calls in the parents. Concretely, for every heterozygous call in the child, if at least one of the 2 alleles can be unambiguously assigned to one of the parents, the call in the child is phased as a:b, meaning allele a came from parent 1 and allele b came from parent 2. The calls are assigned to the unnamed phase set in the VCF file (i.e, no PS format field).


Given a VCF file of variants from one sample, and one or more BAM files of read mappings, compute phased genotypes directly supported by NGS reads.


Given a BAM file of read mappings and a (partly) phased VCF file of variants, split the mappings in the BAM file into several other BAM files. Specifically, for every phase set (PS) defined in the VCF file, and for every autosome (1-22), there will be 2 BAM output files (1 per haplotype) containing the mappings of reads assigned to that phase. In addition, there will be 2 more overflow BAM output files for reads not mapped to autosomes. Mappings which cannot be phased (e.g., not spanning any heterozygous sites) will be assigned to one of the phases at random.


Given a VCF file containing 2 phase set collections (stored as different sample columns), extend the phase sets of the “base” collection using the phase sets of the “extra” collection.


Given a VCF file, collapse all phase sets into one.


Given a VCF file, randomly flip each phase set.


Input and output samples are available in samples.

You can’t perform that action at this time.