Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Ngila is a global alignment program that can align pairs of sequences using logarithmic and affine gap penalties.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Type||Name||Latest commit message||Commit time|
|Failed to load latest commit information.|
NGILA VERSION 1.3 - Logarithmic and Affine Sequence Alignments Copyright (C) 2005-2010 Reed A. Cartwright - All rights reserved. DESCRIPTION Ngila is a global alignment program that can align pairs of sequences using logarithmic and affine gap penalties. REFERENCE Cartwright RA (2007) Ngila: global pairwise alignments with logarithmic and affine gap costs. Bioinformatics. 23(11):1427-1428 CONTACT firstname.lastname@example.org or email@example.com LICENSE GPL ver 3. See copying.txt. INSTALLATION Installation from source requires CMake (http://cmake.org/), Boost Libraries (http://boost.org/). Binary packages are available. To install on unix-like systems simply use cmake . && make && make install in the extracted source code directory. On Windows you can use CMake GUI to create project files for Visual Studio and install from there. More detailed directions and help can be found on the website. DOWNLOAD Ngila can be downloaded from the url <http://scit.us/projects/ngila/>, which is its development website. COMMAND LINE USAGE ngila -m zeta -t 0.1 -k 2.0 -r 0.05 -z 1.65 sequences.fasta See 'ngila --help' for complete command line usage. See <http://scit.us/projects/ngila/> for more details on running ngila. MODEL DESCRIPTIONS Ngila includes models of alignment based on evolutionary models. For a basic description of the evolutionary models see Cartwright RA (2009) "Problems and solutions for estimating indel rates and length distributions." Molecular Biology and Evolution, 26:473-480. The models are as follows: zeta: DNA model with indel lengths following a power-law distribution geo: DNA model with a geometric distribution aazeta: protein model (LG 2008) with a power-law distribution aageo: protein model with a geometric distribution cost: specify substitution and gap costs explicitly INPUT FILES The input file has to be in FASTA, PHYLIP, or CLUSTAL format. If more than two sequences are given then Ngila will align based on the 'pairs' option. OUTPUT FILES Ngila has two types of output: sequence alignments and distance matrices. Supported sequence alignment formats are Clustal, Fasta, and Phylip. Clustal is the default. The format is read from the output file's extension or specified directly; "ngila -o seqs.fas" and "ngila -o fas:seqs.txt" both produce fasta output. "ngila -o aln:-" sends Clustal formated sequence to stdout. The following extensions are supported for distance matrices: dist-c = likelihood-based cost scores, dist-i = sequence identities, dist-d = sequence distances, dist = lower-triangle like dist-d and upper like dist-i. SUBSTITUTION MATRIX Used by the "cost" model. An example of the format can be seen in matrix/dna. NGILARC Command-line options can be specified using an ngilarc file. By default the program looks for $HOME/.ngilarc (unix) or %HOME%/ngilarc.txt (windows). This file can contain long-form command line options like in the ngilarc.txt example file. ALGORITHM Ngila implements a Miller and Myers (1988) candidate list method of sequence alignment with the gap cost being of the form g(x) = a + b*x + c*ln x. Ngila will return the alignment with the minimum cost and has rules for breaking ties. Ngila's main alignment algorithm is divide-and-conquer, which requires O(M) memory; but slower than a holistic, O(MN) memory algorithm. Ngila implements a secondary, holistic algorithm for alignment, which is faster. The options -M and -N (-M is for the larger sequence) allow users to specify thresholds for when the holistic algorithm is used instead of the DnC algorithm. For example, command 'ngila -M 5000 -N 5000 seqs.aln' will align the sequences in 'seq.aln' via the divide-and-conquer algorithm, but when subsequences less than or equal to 5000-5000 are being aligned, the holistic algorithm will be used.