# Phylogeny and Mapping

## Introduction

Phylogenetics is essentially about similarity, and looking at patterns of similarity between taxa to infer their relationships. It has important applications in many fields of genome biology. For example, when annotating a gene in a new genome it is useful for identifying previously-annotated genes in other genomes that share a common ancestry. It is also becoming increasingly common to use phylogeny to trace the evolution and spread of bacterial diseases, and even as an epidemiological tool to help identify disease outbreaks in a clinical setting. Further analysis of genome sequences to examine recombination, molecular adaptation and the evolution of gene function, all benefit from phylogeny.

## Learning outcomes
On completion of the tutorial, you can expect to be able to:

* Identify different approaches to constructing phylogeny from whole genome sequence data
* Map sequence data to a reference genome and identify variants
* Use SNP data to construct a phylogenetic tree
* Identify and remove recombination with Gubbins
* Visualise phylogenies in the context of the sample metadata

## Tutorial sections
This tutorial comprises the following sections:   
 1. [Introduction to phylogenetics](intro_to_phylogeny.ipynb)  
 2. [Phylogeny from gene sequences](gene.ipynb)   
 3. [Phylogeny from whole genome sequence data](snp_phylogeny.ipynb)   
 4. [Phylogeny and metadata](metadata.ipynb)  
  
## Authors and License
This tutorial was written by [Jacqui Keane](https://github.com/jacquikeane).

Some of the material has been adapted from the Wellcome Connecting Science Courses [AMR-ASIA23](https://github.com/WCSCourses/AMR-Asia-23), [GenEpiLAC2023](https://github.com/WCSCourses/GenEpiLAC2023) and  [WWPG21](https://github.com/WCSCourses/WWPG_2021).

The content is licensed under a [Creative Commons Attribution 4.0 International License (CC-By 4.0)](https://creativecommons.org/licenses/by/4.0/).

## Running the commands in this tutorial
You can follow this tutorial by typing all the commands you see in a terminal window on your computer. Remember, the terminal window is similar to the "Command Prompt" window on MS Windows systems, which allows the user to type DOS commands to manage files.

To get started, open a terminal window and type the command below followed by the `Enter` key:

In [None]:
cd ~/course_data/snp-phylogeny/data

## Prerequisites
This tutorial assumes that you have the following software and their dependencies installed on your computer. The software used in this tutorial may be updated from time to time so, we have also given you the version which was used when writing this tutorial.


| Package name | Link for download/installation instructions                          | Version |
| :----------: | :------------------------------------------------------------------: |:------: |
| seaview      | https://doua.prabi.fr/software/seaview                               | 5.0.5   |
| fastqc       | https://www.bioinformatics.babraham.ac.uk/projects/fastqc            | 0.12.1  |
| fastp        | https://github.com/OpenGene/fastp                                    | 0.23.4  |
| bwa          | https://github.com/lh3/bwa                                           | 0.7.17  |
| samtools     | https://github.com/samtools/samtools                                 | 1.17    |
| bcftools     | https://github.com/samtools/bcftools                                 | 1.17    |
| snp-sites    | https://github.com/sanger-pathogens/snp-sites                        | 2.5.1   |
| gubbins      | https://github.com/nickjcroucher/gubbins                             | 3.3.0   |
| iqtree       | http://www.iqtree.org/                                               | 2.2.3   |
| FigTree      | http://tree.bio.ed.ac.uk/software/figtree/                           | 1.4.4   |
| Microreact   | https://microreact.org                                               | 240     |

The easiest way to install the required software is using `conda`, a software package manager. These software have already been installed on the computer for you. To activate them type:

In [None]:
conda activate snp-phylogeny

After the software is activated type the following commands:

In [None]:
seaview &

In [None]:
fastqc -h

In [None]:
fastp -h

In [None]:
bwa

In [None]:
snp-sites -h

In [None]:
run_gubbins.py -h

In [None]:
iqtree -h

In [None]:
figtree &

This should return the help message for these tools or launch the GUI software in the background.

To get started with the tutorial, go to the first section: [Introduction to phylogeny](intro_to_phylogeny.ipynb)