Skip to content

Distance Types

Martin Vickers edited this page Oct 11, 2018 · 33 revisions

Here you will find complete descriptions of all the alignment-free distance measures implemented with KAST along with references.

Euclid (AKA Euler or Euclidean Distance)

To select Euclid as a distance measurement within KAST, use the -t euclid flag.

Manhattan

To select Manhattan as a distance measurement within KAST, use the -t manhattan flag.

d2

To select d2 [5] as a distance measurement within KAST, use the -t d2 flag.

Chebyshev

To select Chebyshev as a distance measurement within KAST, use the -t chebyshev flag.

Normalised Google Distance

To select Normalised Google Distance [6] as a distance measurement within KAST, use the -t ngd flag.

Bray-Curtis

To select Bray-Curtis [4] as a distance measurement within KAST, use the -t bc flag.

Markov Based Measures

D2S

To select D2S [1] as a distance measurement within KAST, use the -t D2S flag.

D2Star (AKA D2*)

To select D2Star [1] as a distance measurement within KAST, use the -t D2Star flag.

S2 (AKA dAI)

To select S2 [3] as a distance measurement within KAST, use the -t dai flag.

d2s

To select d2s [2] as a distance measurement within KAST, use the -t d2s flag.

d2star (AKA d2*)

To select d2star [2] as a distance measurement within KAST, use the -t d2star flag.

References

[1]: Reinert G, Chew D, Sun F, Waterman MS. Alignment-Free Sequence Comparison (I): Statistics and Power. Journal of Computational Biology. 2009;16(12):1615-1634. doi:10.1089/cmb.2009.0198.

[2]: Wang Y, Liu L, Chen L, Chen T, Sun F. Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies. Parkinson J, ed. PLoS ONE. 2014;9(1):e84348. doi:10.1371/journal.pone.0084348.

[3]: Dai Q, Yang Y, Wang T (2008) Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison. Bioinformatics 24: 2296–2302.

[4]: Bray, J. R. and Curtis, J. T. (1957), An Ordination of the Upland Forest Communities of Southern Wisconsin. Ecological Monographs, 27: 325-349. doi:10.2307/1942268

[5]: Blaisdell B (1986) A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Natl Acad Sci USA 83: 5155–5159.

[6]: R. L. Cilibrasi and P. M. B. Vitanyi, "The Google Similarity Distance," in IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 3, pp. 370-383, March 2007. doi: 10.1109/TKDE.2007.48

Clone this wiki locally