Programs to compute the NCCD (Normalized Conditional Compression Distance) and perform phylogenomics (whole genome) on 48 bird species. It will use a state-of-the-art genomic compressor, based on a mixture of finite-context models, as a metric distance.
Simply run:
wget https://github.com/pratas/NCCD/archive/master.zip unzip master.zip cd NCCD-master
Make shore you have at least 200 GB of space in the hard drive. Then, simply run:
. run.sh
It will download and install GeCo (https://github.com/pratas/geco/), although it might be needed to install cmake. Then, it will download all the the 48 bird sequences and run the NCCD.
For other purposes, such as a simple information distance between two sequences (fileA and fileB), go to scripts:
cd scripts
and run
. NCCD.sh ../examples/fileA ../examples/fileB
It will calculate the NCCD on two synthetic sequence examples included in the system.
For any issue let us know at issues link.
GPL v2.
For more information:
http://www.gnu.org/licenses/gpl-2.0.html