Calculate allele sharing distances
C++ C Makefile
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
include/win32
lib/win32
src
.gitignore
LICENSE.md
README
test_data_1.stru
test_data_2.stru

README

asd is a multi-threaded program to quickly compute pairwise allele sharing distances between diploid individuals with an arbitrary number of alleles per locus.  asd reads TPED/TFAM and STRU formatted files and will load gzipped files without first requiring decompression. If your data are strictly biallelic, you may use the --biallelic feature for better efficiency.

usage

./asd --stru <filename> --id-col <column where individual names are>

or

./asd --tped <filename>.tped --tfam <filename>.tfam

optional flags --partial, --log, --ibs-all, --out, --missing-stru, --missing-tped, --biallelic, --combine, --long, --long-ibs

asd v1.0.0
----------Command Line Arguments----------

--biallelic <bool>: Set for more efficient computations when all loci are biallelic.
	Default: false

--combine <string1> ... <stringN>: Combine several files generated with --partial.
	Default: __none

--help <bool>: Prints this help dialog.
	Default: false

--ibs <bool>: Set to output all IBS sharing matrices too.
	Default: false

--id <int>: Column where individual IDs are. (stru files only)
	Default: 1

--log <bool>: Transforms allele sharing distance by -ln(ps) instead of 1-(ps).
	Default: false

--long <bool>: Instead of printing a matrix, print allele sharing distances one
per row. Formatted <ID1> <ID2> <allele sharing distance>.
Not compatible with --partial.
	Default: false

--long-ibs <bool>: Instead of printing a matrix, print IBS calculations one
per row. Formatted <ID1> <ID2> <IBS0/1/2>.
Not compatible with --partial.
	Default: false

--missing-stru <string>: For stru files, set the missing data value.
	Default: -9

--missing-tped <string>: For stru files, set the missing data value.
	Default: 0

--out <string>: Basename for output files.
	Default: outfile

--partial <bool>: If set, outputs two nind x nind matrices. The first is the number of loci used
for each pairwise comparison, and the second is the untransformed dist/IBS
matrix. Useful for splitting up very large jobs. Can combine with --combine.
	Default: false

--stru <string>: The input data filename (stru format). Change missing data code with
--missing-stru.
	Default: __none

--tfam <string>: The input data filename (tfam format), also requires a .tped file (--tped).
	Default: __none

--threads <int>: Number of threads to spawn for faster calculation.
	Default: 1

--tped <string>: The input data filename (tped format). Requires a .tfam file (--tfam).
Change missing data code with --missing-tped.
	Default: __none


NOTE: Multi-dimensional scaling plots can be created easily in R with the following commands (assuming --partial was not used):

      data<-read.table("<outfile>.asd.dist",header=TRUE)
      data.mds<-cmdscale(data)
      plot(data.mds)

A toy example with no missing data:

./asd --stru test_data_1.stru --id-col 1

output:

ind0 ind1 ind2 ind3 ind4 ind5 
ind0 0 0.5 0.3 0.2 0.3 0.5 
ind1 0.5 0 0.2 0.3 0.4 0.8 
ind2 0.3 0.2 0 0.1 0.2 0.6 
ind3 0.2 0.3 0.1 0 0.3 0.5 
ind4 0.3 0.4 0.2 0.3 0 0.6 
ind5 0.5 0.8 0.6 0.5 0.6 0 


./asd --stru test_data_1.stru --id-col 1 --partial

output:

6
5 5 5 5 5 5 
5 5 5 5 5 5 
5 5 5 5 5 5 
5 5 5 5 5 5 
5 5 5 5 5 5 
5 5 5 5 5 5 

ind0 ind1 ind2 ind3 ind4 ind5 
ind0 5 2.5 3.5 4 3.5 2.5 
ind1 2.5 5 4 3.5 3 1 
ind2 3.5 4 5 4.5 4 2 
ind3 4 3.5 4.5 5 3.5 2.5 
ind4 3.5 3 4 3.5 5 2 
ind5 2.5 1 2 2.5 2 5 


A toy example with missing data:

./asd --stru test_data_2.stru --id-col 1

output:

ind0 ind1 ind2 ind3 ind4 ind5 
ind0 0 0.5 0.25 0.125 0.333333 0.166667 
ind1 0.5 0 0.125 0.25 0.333333 0.666667 
ind2 0.25 0.125 0 0.1 0.125 0.5 
ind3 0.125 0.25 0.1 0 0.25 0.375 
ind4 0.333333 0.333333 0.125 0.25 0 0.625 
ind5 0.166667 0.666667 0.5 0.375 0.625 0 


./asd --stru test_data_2.stru --id-col 1 --partial

output:

6
4 3 4 4 3 3 
3 4 4 4 3 3 
4 4 5 5 4 4 
4 4 5 5 4 4 
3 3 4 4 4 4 
3 3 4 4 4 4 

ind0 ind1 ind2 ind3 ind4 ind5 
ind0 4 1.5 3 3.5 2 2.5 
ind1 1.5 4 3.5 3 2 1 
ind2 3 3.5 5 4.5 3.5 2 
ind3 3.5 3 4.5 5 3 2.5 
ind4 2 2 3.5 3 4 1.5 
ind5 2.5 1 2 2.5 1.5 4