Skip to content

Program to count k-mers in FASTA references to estimate pairwise phylogenetic distances

Notifications You must be signed in to change notification settings

guilhermesena1/phylogenetic-by-genome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

k-mer based phylogenetic reconstruction

A simple proof-of-concept of reconstructing history from publicly available reference genomes.

Cloning the directory

git clone https://github.com/guilhermesena1/phylogenetic-by-genome.git
cd  phylogenetic-by-genome

Compiling the code

make

Downloading genomes

https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
https://hgdownload.soe.ucsc.edu/goldenPath/gorGor6/bigZips/gorGor6.fa.gz
https://hgdownload.soe.ucsc.edu/goldenPath/mm39/bigZips/mm39.fa.gz
https://hgdownload.soe.ucsc.edu/goldenPath/criGriChoV2/bigZips/criGriChoV2.fa.gz
https://hgdownload.soe.ucsc.edu/goldenPath/felCat9/bigZips/felCat9.fa.gz
https://hgdownload.soe.ucsc.edu/goldenPath/vicPac2/bigZips/vicPac2.fa.gz
https://hgdownload.soe.ucsc.edu/goldenPath/balAcu1/bigZips/balAcu1.fa.gz
for i in $(cat genome_urls.txt); do echo $i; wget ${i}; done
gunzip *.fa.gz;
mkdir genomes
mv *.fa genomes

Running the reconstruction in C++

./phylo genome_inputs.txt >kmer-counts.tsv

Creating the hierarchical clustering in R

> x <- read.table('kmer-counts.tsv', header = T, row.names=NULL)
> plot(hclust(dist(t(x))), hang = -1, xlab = "species", ylab = "k-mer squared distance")

About

Program to count k-mers in FASTA references to estimate pairwise phylogenetic distances

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published