# **Molecular Phylogenetics**
## **Visualization Pro**

### **0) Introduction**

> For this work, we will use sequences and metadata from the viral hemorrhagic septicemia virus (VHSV). This virus is a fish novirhabdovirus (negative stranded RNA virus) with an unusually broad host spectra: it has been isolated from more than 80 fish species in locations around the Northern hemisphere.<br>
>
>This guide was inspired by: https://github.com/acarafat/tutorials. Yet there are some new adjustments!

### **1) Programmes used**

|Program|
|-------|
|mafft|
|iqtree|
|pandas|

- OS: Windows 11, WSL2 (Ubuntu 22.04)
- CPU: Intel Xeon E5 2670v3 (12 cores/24 threads)
- RAM: 32GB (16GB for WSL2)

### **2) Multiple Sequences Alignment**

Run multiple sequences alignment with `mafft`

In [None]:
! mafft data/vhsv.fasta > data/vhsv_mafft.fa

### **3) Selecting a model in ModelFinder (IQ-TREE)**

Create a directory for `ModelFinder` output

In [3]:
! mkdir data/modelfinder

Run `ModelFinder`

In [None]:
! iqtree2 -m MFP -s data/vhsv_mafft.fa --prefix data/modelfinder/vhsv_MF2

Examine the best model of evolution that is the most suitable for our alignment

In [5]:
! head -42 data/modelfinder/vhsv_MF2.iqtree | tail -6

Best-fit model according to BIC: TVMe+I+R2

List of models sorted by BIC scores: 

Model                  LogL         AIC      w-AIC        AICc     w-AICc         BIC      w-BIC
TVMe+I+R2         -6984.748   14221.497 -  0.00151   14244.406 -   0.0031   14892.963 +    0.719


In total we see that the model `TVMe+I+R2` is recognised as the best!

### **4) Build an ML-tree in IQ-TREE using the selected model**

Create a directory for `IQ-TREE` output

In [7]:
! mkdir data/iqtree

Run `IQ-TREE` with 1000 replicates of `bootstrap` and `alrt`

In [None]:
! iqtree2 -s data/vhsv_mafft.fa -m TVMe+I+R2 -pre data/iqtree/vhsv -bb 1000 -alrt 1000

Examine the `IQ-TREE` output folder

In [1]:
! ls data/iqtree

vhsv.bionj   vhsv.contree  vhsv.log	vhsv.splits.nex
vhsv.ckp.gz  vhsv.iqtree   vhsv.mldist	vhsv.treefile


Examine `vhsv.treefile` file

In [4]:
! cat data/iqtree/vhsv.treefile

(AU-8-95:0.0110908144,(CH-FI262BFH:0.0131626927,(DK-200098:0.0013245216,DK-9995144:0.0000021124)99.9/100:0.0095461686)9/64:0.0007239176,((((((((DK-1p40:0.0019819713,DK-1p86:0.0019797009)76.6/97:0.0006555511,((DK-1p8:0.0000009918,(((((DK-4p37:0.0000009918,SE-SVA14:0.0006553525)86.8/99:0.0006552139,DK-5e59:0.0013137944)0/59:0.0000009918,SE-SVA-1033:0.0000009918)78.3/95:0.0006575436,((DK-6p403:0.0006555940,SE-SVA31:0.0006553577)0/84:0.0000009918,UK-MLA98-6HE1:0.0000009918)85.8/98:0.0019746439)74.9/96:0.0006544639,KRRV9601:0.0006553263)0/5:0.0000009918)0/35:0.0000009918,UK-9643:0.0033018138)77.4/95:0.0006559696)99.4/100:0.0054050544,DK-M.rhabdo:0.0026557033)97.1/99:0.0043768013,((((((DK-1p53:0.0006554948,DK-1p55:0.0000009918)100/100:0.0816239127,(US-Makah:0.0285256070,US-Goby1-5:0.0140617136)100/100:0.1210196539)45.4/78:0.0178122396,(((DK-4p101:0.0098266846,((DK-4p168:0.0013124829,(UK-H17-2-95:0.0026484120,(UK-H17-5-93:0.0006595661,UK-MLA98-6PT11:0.0013182714)76.2/99:0.0006379007)76.8/99:0

This is the tree in `Newick` format that we will use for visualization!

### **5) Examine metadata**

Import `pandas`

In [1]:
import pandas as pd

Read the metadata file

In [3]:
pd.read_csv('data/metadata.csv')  

Unnamed: 0,Strain,Host,Water,Country,ACCNo,Year
0,AU-8-95,Rainbow trout,Fresh water,AU,AY546570.1,1995
1,CH-FI262BFH,Rainbow trout,Fresh water,CH,AY546571.1,1999
2,DK-1p40,Rockling,Sea water,DK,AY546575.1,1996
3,DK-1p53,Atlantic Herring,Sea water,DK,AY546577.1,1996
4,DK-1p55,Sprat,Sea water,DK,AY546578.1,1996
...,...,...,...,...,...,...
56,UK-H17-5-93,Cod,Sea water,UK,AY546630.1,1993
57,UK-MLA98-6HE1,Herring,Sea water,UK,AY546631.1,1998
58,UK-MLA98-6PT11,Norway prout,Sea water,UK,AY546632.1,1998
59,US-Makah,Coho salmon,Fresh water,USA,U28747.1,1988


### **6) Visualization**

For visualization we will use `R` and `RStudio`.<br>
Please open the next journal - `04_07_Lab_journal_2.Rmd` file and follow the guide.