Ngmlr is a long-read mapper designed to align PacBilo or Oxford Nanopore to a reference genome with a focus on reads that span structural variations
C++ C Python Makefile CMake
Latest commit 3188839 Feb 6, 2017 @philres Readme update
Permalink
Failed to load latest commit information.
docker Dockerfile added Jan 18, 2017
lib seqan removed Jan 14, 2017
src Cleanup Jan 29, 2017
.gitignore gitignore Nov 24, 2016
CMakeLists.txt Version 0.2.3 + switched to release build Jan 15, 2017
LICENSE Switched license to MIT Jan 28, 2017
README.md Readme update Feb 6, 2017

README.md

Quick start

Download binary from github and unzip or install with bioconda or pull docker Docker Repository on Quay. For updates follow Twitter URL

For Pacbio data run:

ngmlr -t 4 -r reference.fasta -q reads.fastq -o test.sam

For Oxford Nanopore run:

ngmlr -t 4 -r reference.fasta -q reads.fastq -o test.sam -x ont

Intorduction

The coNvex Gap-cost alignMent for Long Reads (ngmlr) is a long-read mapper desigend to sensitively align PacBilo or Oxford Nanopore to (large) reference genomes. It was desigend to correctly align reads spanning (complex) structural variations. Ngmlr uses an SV aware k-mer search to find approximate mapping locations for a read and a banded Smith-Waterman alignment algorithm with a non-affine gap model that penalizes gap extensions for longer gaps less than for shorter ones to compute precise alignments. The gap model allows ngmlr to account for both the sequencing error and real genomic variations at the same time and makes it especially effective at more precisely identifying the position of breakpoints stemming from (complex) structural variations. The k-mer search helps to detect and split reads that cannot be aligned linearly, enabling ngmlr to reliably align reads to a wide range of different structural variations including nested SVs (e.g. inversions flanked by deletions). Currently ngmlr takes about 60 minutes (on a AMD Opteron 6348) and 10 GB RAM for aligning 1Gbp of Pacbio Reads when using 10 threads.

Poster & Talks:

Accurate and fast detection of complex and nested structural variations using long read technologies Biological Data Science, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 26 - 29.10.2016

NGMLR: Highly accurate read mapping of third generation sequencing reads for improved structural variation analysis Genome Informatics 2016, Wellcome Genome Campus Conference Centre, Hinxton, Cambridge, UK, 19.09.-2.09.2016

Parameters

Usage: ngmlr [options] -r <reference> -q <reads> [-o <output>]

Input/Output:
    -r <file>,  --reference <file>
        (required)  Path to the reference genome (FASTA/Q, can be gzipped)
    -q <file>,  --query <file>
        (required)  Path to the read file (FASTA/Q)
    -o <file>,  --output <file>
        Path to output file [stdout]

General:
    -t <int>,  --threads <int>
        Number of threads [1]
    -x <pacbio, ont>,  --presets <pacbio, ont>
        Parameter presets for different sequencing technologies [pacbio]
    -i <0-1>,  --min-identity <0-1>
        Alignments with an identity lower than this threshold will be discarded [0.65]
    -R <int/float>,  --min-residues <int/float>
        Alignments containing less than <int> or (<float> * read length) residues will be discarded [50]
    --no-smallinv
        Do not detect small inversions [false]
    --no-lowqualitysplit
        Do not split alignments with poor quality [false]
    --verbose
        Debug output [false]
    --no-progress
        Do not print progress info while mapping [false]

Advanced:
    --match <float>
        Match score [2]
    --mismatch <float>
        Mismatch score [-5]
    --gap-open <float>
        Gap open score [-5]
    --gap-extend-max <float>
        Gap open extend max [-5]
    --gap-extend-min <float>
        Gap open extend min [-1]
    --gap-decay <float>
        Gap extend decay [0.15]
    -k <10-15>,  --kmer-length <10-15>
        K-mer length in bases [13]
    --kmer-skip <int>
        Number of k-mers to skip when building the lookup table from the reference [2]
    --bin-size <int>
        Sets the size of the grid used during candidate search [4]
    --subread-length <int>
        Length of fragments reads are split into [256]
    --subread-corridor <int>
        Length of corridor sub-reads are aligned with [40]

Running with docker

docker run -ti -v /home/user/data/:/home/user/data/ quay.io/philres/ngmlr ngmlr -r /home/user/data/ref.fa -q /home/user/data/reads.fasta -o /home/user/data/output.sam

Building ngmlr from source

OS: Linux and Mac OSX: Requirements: zlib-dev, cmake, gcc/g++ (>=4.8.2)

git clone https://github.com/philres/ngmlr.git
cd ngmlr/
mkdir -p build
cd build/
cmake ..
make

cd ../bin/ngmlr-*/
./ngmlr

Building ngmlr for linux with docker

git clone https://github.com/philres/ngmlr.git
mkdir -p ngmlr/build
docker run -v `pwd`/ngmlr:/ngmlr philres/nextgenmaplr-buildenv bash -c "cd /ngmlr/build && cmake .. &&  make"
`pwd`/ngmlr/bin/ngmlr-*/ngmlr