####Downloading and compiling
git clone git://github.com/lh3/alktk.git
(cd alktk; gcc -g -O2 -Wall -lz -o alktk alktk.c)
####Generating .alk files
ALK files are generated with a special version of samtools, which has not been pushed to the official github as a lot of development is happening there. The command line is:
bcftools view -cuI -K out.alk in.bcf > /dev/null 2>&1
Here the reference allele in out.alk
is the allele used for generating the
.bcf file.
####Viewing .alk files
alktk cat -t in.alk # output entire file
alktk cat -tp in.alk # only print positions and the reference allele frequency
In the full output, each line consists of 1) chromosome index, 2) position, 3) ML estimate of the reference allele frequency and 4) L(k) = Pr{d | there are k reference alleles}, for k=0,...,M, where M is the number of chromosomes in the samples.
####Subsetting .alk files
alktk sites in.alk in.srt.list > out.alk
where in.srt.list
is a TAB delimited file with each line consists of 1) index
of the chromosome, 2) position and 3) whether the reference allele in the
original .bcf is the ancestral allele. This file must be position sorted. An
example of in.srt.list
is:
1 123456 0
1 234567 1
2 12345 0
This command flips the reference allele if it is not ancestral. In the output, the reference allele is the ancestral allele.
####Estimating AFS with EM
alktk afs in.alk > iter-1.txt # starting from a flat prior
alktk afs in.alk iter-1.txt > iter-2.txt
alktk afs in.alk iter-2.txt > iter-3.txt
Here in.alk
is typically the .alk on a list of sites, generated by alktk sites
. Each afs
command does one round of EM. The output gives the reference
allele frequency.