University of Dallas Bio Informatics Club
Group 1 (Michael and Kaitlyn). Obtain all SARS Coronavirus 2 sequences in GenBank as a DNA FASTA File and as a Protein FASTA file
Group 2 (Joseph and Gretta). Decide what software will be used to analyze protein coding sequences (CD-HIT) or functional characteristics of protein FASTA file
Gameplan
- Make phylogenetic tree of all COVID-19 sequences and BLAST (non-COVID-19) sequences (Kaitlyn)
- Perform CD-HIT analysis on protein coding (or non-coding) sequences (Christian)
- Find motif differences between clusters (Joseph)
- Kaitlyn - filter for final strains to be used in analysis (strains will be filtered based on geographical location information) and add extra strains from Dr. Toby's initial tree + Alphacoronavirus, murine, pigeon, mouse, bat from "Coronavirus" search in Genbank
- Kaitlyn - mass phylogenetic assessment
- Everyone - take a look at clades and look for interesting patterns
- Joseph/Christian - Use accession # from Kaitlyn's tree strains, download protein FASTAs for each accession #, and merge all files into 1.
- Christian - Perform CD-HIT on mother-of-all FASTA and change cutoff to see how # of clusters changes.
Redo analyses for April 10th with corrected protocols.
- Add your individual methods/figures to Google Doc
- Kaitlyn - Add 2 extra MERS and 2 extra SARS (HUMAN INFECTIONS) to pool
- Christian - Add Kaitlyn's new accessions to MOAs and parse using Jupyter