This repository hosts curated data sets (alignments) from measurably evolving populations.
Most data sets are from fast-evolving RNA viruses.
These are 90 full genome or just envelope sequences from DENV-2 isolates from the Americas, ranging from 1987 to 2007. They were obtained from a bigger data set consisting of the Broad Institute's Dengue Virus Portal (2382 full genomes) and then subsampled to have at most five samples from each year. The full data base can be found here. Note that sequences were downloaded pre-aligned.
This data set contains 226 hemagglutinin (HA) sequences. All human H3N2 sequences longer than 1700 base pairs were dowloaded from the Influenza Research Database totalising 8455 sequences. These were downsampled in order for us to have at most five sequences from each year after 1968. Sequences were then aligned by codons using the MAFFT program called from inside the Geneious software package.
This data set consists of 187 gag-pol HIV subtype B sequences. All reasonably complete gag-pol sequences from HIV subtype B were dowloaded from the Los Alamos Database along with the relevant metadata and then subsampled to achieve at most five sequences from each year after 1983. This data set can be found here and was created using this. Note that sequences were downloaded pre-aligned.