Skip to content

Pairwise SNP distance matrix from a FASTA sequence alignment

License

Notifications You must be signed in to change notification settings

schultzm/snp-dists

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status License: GPLv3 Language: C99 Zenodo

snp-dists

Convert a FASTA alignment to SNP distance matrix

Quick Start

% cat test/good.aln

>seq1
AGTCAGTC
>seq2
AGGCAGTC
>seq3
AGTGAGTA
>seq4
TGTTAGAC

% snp-dists test/good.aln > distances.tab

Read 4 sequences of length 8

% cat distances.tab

snp-dists 0.2   seq1    seq2    seq3    seq4
seq1            0       1       2       3
seq2            1       0       3       4
seq3            2       3       0       4
seq4            3       4       4       0

Installation

snp-dists is written in C to the C99 standard and only depends on zlib.

Homebrew

brew install brewsci/bio/snp-dists

Bioconda

conda install -c bioconda -c conda-forge snp-dists

Source

git clone https://github.com/tseemann/snp-dists.git
cd snp-dists
make

# run tests
make check

# optionally install to a specific location (default: /usr/local)
make PREFIX=/usr/local install

Options

snp-dists -h (help)

SYNOPSIS
  Pairwise SNP distance matrix from a FASTA alignment
USAGE
  snp-dists [options] alignment.fasta[.gz] > matrix.tsv
OPTIONS
  -h    Show this help
  -v    Print version and exit
  -q    Quiet mode; do not print progress information
  -a    Count all differences not just [AGTC]
  -k    Keep case, don't uppercase all letters
  -c    Output CSV instead of TSV
  -b    Blank top left corner cell instead of 'snp-dists 0.3'
URL
  https://github.com/tseemann/snp-dists (Torsten Seemann)

snp-dists -v (version)

Prints the name and version separated by a space in standard Unix fashion.

snp-dists 0.5

snp-dists -q (quiet mode)

Don't print informational messages, only errors.

snp-dists -c (CSV instead of TSV)

snp-dists 0.5,seq1,seq2,seq3,seq4
seq1,0,1,2,3
seq2,1,0,3,4
seq3,2,3,0,4
seq4,3,4,4,0

snp-dists -b (omit the toolname/version)

        seq1    seq2    seq3    seq4
seq1    0       1       2       3
seq2    1       0       3       4
seq3    2       3       0       4
seq4    3       4       4       0

Advanced options

By default, all letters are (1) uppercased and (2) ignored if not A,G,T or C.

snp-dists -a (don't just count AGTC)

Normally one would not want to count ambiguous letters and gaps as a "difference" but if you desire, you can enable this option.

>seq1
NGTCAGTC
>seq2
AG-CAGTC
>seq3
AGTGNGTA

snp-dists -k (don't uppercase any letters)

You may wish to preserve case, as you may wish lower-case characters to be masked in the comparison.

>seq1
AgTCAgTC
>seq2
AggCAgTC
>seq3
AgTgAgTA

Issues

Report bugs and give suggesions on the Issues page

Related software

Licence

GPL Version 3

Authors

About

Pairwise SNP distance matrix from a FASTA sequence alignment

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • C 89.5%
  • Makefile 7.2%
  • TeX 3.3%