@author Harald Ringbauer, March 20th 2020
This is a project to analyze data from publicly available COVID19 viral data.
The goal is to use genetic data to learn about key parameters and whether they vary across strains (e.g. virality), 2) To learn about the history of the outbreak and 3) to develop a realtime analyis tool.
To align sequences:
- Download the fasta from gisaid Sometimes they have blank lines in the beginning. Remove these
Downoad meta data from nextstrain git.
-
Copy these two files into
./data -
run
notebooks/process_data/align_fastafollow instructions there, top to bottom -
run
notebooks/create_h5.ipynbfollow instructions there. Creates h5 and also tables and .csvs of interesting loci and MAFs