Skip to content

maxbiostat/MEP_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Measurably evolving populations

This repository hosts curated data sets (alignments) from measurably evolving populations.

Most data sets are from fast-evolving RNA viruses.

Dengue virus serotype 2

These are 90 full genome or just envelope sequences from DENV-2 isolates from the Americas, ranging from 1987 to 2007. They were obtained from a bigger data set consisting of the Broad Institute's Dengue Virus Portal (2382 full genomes) and then subsampled to have at most five samples from each year. The full data base can be found here. Note that sequences were downloaded pre-aligned.

Human H3N2 Influenza A

This data set contains 226 hemagglutinin (HA) sequences. All human H3N2 sequences longer than 1700 base pairs were dowloaded from the Influenza Research Database totalising 8455 sequences. These were downsampled in order for us to have at most five sequences from each year after 1968. Sequences were then aligned by codons using the MAFFT program called from inside the Geneious software package.

HIV subtype B

This data set consists of 187 gag-pol HIV subtype B sequences. All reasonably complete gag-pol sequences from HIV subtype B were dowloaded from the Los Alamos Database along with the relevant metadata and then subsampled to achieve at most five sequences from each year after 1983. This data set can be found here and was created using this. Note that sequences were downloaded pre-aligned.

Releases

No releases published

Packages

No packages published

Languages