# Submodule #1: Understanding the Basics of Phylogenetic Trees

### Introduction to Phylogenetics

Phylogenetics is the study of evolutionary relationships among organisms using genetic or phenotypic data. It uses genetic data (like DNA) or phenotypic data (traits like size or color) to understand these relationships. By comparing similarities and differences, scientists can create evolutionary trees (called phylogenetic trees) that show how species share common ancestors and have evolved over time.

To illustrate its importance, consider the COVID-19 pandemic. Researchers around the world have used phylogenetic trees to:
 - Track the spread and mutation of SARS-CoV-2.
 - Identify emerging variants and their ancestral relationships.
 - Understand how the virus evolves over time and across populations.

This module builds foundational skills to analyze and visualize phylogenetic trees using Python. By the end, these skills can be applied to real-world biological datasets, including COVID-19 genomic data.






# Learning Objectives:

Phylogenetics explores the evolutionary connections and ancestral relationships among organisms, providing insights into their genetic and evolutionary history. This submodule introduces the foundational concepts of phylogenetic trees, helping learners understand their structure, purpose, and applications in biological research. By the end of this module, learners will be able to define and interpret phylogenetic trees, recognize their importance in mapping genetic changes and understanding biodiversity, and apply practical skills to construct and analyze phylogenetic trees using Python for real-world biological data.

- **What You'll Learn:**
    - Basics of phylogenetic trees and their significance.
    - Steps to create and interpret different tree types (Rooted, Unrooted, Cladograms, Phylograms, Dendrograms).
    - Hands-on examples using Python for visualizations.
- **Tools and Libraries:** 
    - `Biopython` for phylogenetic analysis.
    - `Matplotlib` for visualization.
    - Real-world biological data in Newick format.
- **Why It Matters:**
    - Phylogenetic trees provide insights into evolutionary history, biodiversity, and genetic variation.
    - Understanding these trees is critical for applications in genomics, disease research, and conservation biology.



----------------------------------------------------------------------------------------------------------------
# Training Plan 

<font color="green"> **Submodule #1: Understanding the Basics of Phylogenetic** </font>

 
Submodule #2: Collect and Prepare Sequence Data and Analysis


Submodule #3: Construct Phylogenetic Tree

 
Submodule #4: Analyze Phylogenetic Tree

-------------------------------------------------------------------------------------------------------------------

## What is a Phylogenetic Tree?
Imagine tracing your family tree to uncover your ancestry and relationships with relatives. A phylogenetic tree works similarly, but instead of human relatives, it maps the evolutionary history of organisms. It illustrates how species are connected, showing shared ancestry and how closely related they are based on genetic or physical traits.

For example, consider a phylogenetic tree of primates. It reveals how humans, chimpanzees, gorillas, and orangutans share common ancestors and highlights their evolutionary divergence. A phylogenetic tree is essentially a hypothesis that depicts evolutionary pathways and relationships.

Example: COVID-19 Variants
During the COVID-19 pandemic, phylogenetic trees were essential in understanding the virus's evolution. For instance, comparing genetic sequences of SARS-CoV-2 samples from different regions helped scientists:
 - Identify new variants like Alpha, Delta, and Omicron
 - Trace how these variants spread geographically.
 - Predict future mutations and design targeted public health responses.

## Why Are Phylogenetic Trees Important?

Phylogenetic trees are powerful tools for visualizing and understanding the relationships among organisms, genes, or pathogens. They play a central role in many fields, including evolutionary biology, disease research, and conservation. Here’s a detailed breakdown of their importance, with real-world examples like COVID-19:

1. **Tracing Evolutionary Pathways:** Phylogenetic trees help scientists understand the origins and evolutionary history of species, populations, or pathogens. They reveal how organisms are related to one another through a shared common ancestor and show the timeline of divergence. Example: COVID-19 When SARS-CoV-2 first emerged, scientists constructed phylogenetic trees to trace its origins. By comparing the genetic sequence of SARS-CoV-2 to other coronaviruses, researchers found that it likely originated from a bat coronavirus and later jumped to humans. Phylogenetic analysis continues to track how the virus evolves, leading to new variants like Delta, Omicron, and their sublineages.
2. **Mapping Genetic Changes:** Phylogenetic trees allow us to identify and track mutations and genetic divergence over time. This is particularly important for studying pathogens, as mutations can impact their transmissibility, severity, or resistance to treatment.
Example: Tracking SARS-CoV-2 Mutations
As SARS-CoV-2 spreads, it accumulates mutations in its genome. Scientists use phylogenetic trees to map these changes and understand their significance. For instance:
   - **The Delta variant**: Known for increased transmissibility.
   - **The Omicron variant**: Has numerous mutations in its spike protein, which affects vaccine efficacy.
By visualizing these changes on a phylogenetic tree, researchers can predict how future variants might evolve and prepare vaccines or treatments accordingly.
           
3. **Understanding Biodiversity:** Phylogenetic trees help us explore how species diversify, adapt, and evolve over time. They show relationships among organisms and help identify patterns of adaptation to different environments.
Example: Bird Species Adaptation
In evolutionary biology, phylogenetic trees have been used to study Darwin’s finches on the Galápagos Islands. The trees revealed how the finches adapted to different niches, leading to the evolution of distinct beak shapes that suited their food sources. This example highlights how evolutionary pressures drive biodiversity.
4. **Disease Research:** Phylogenetic trees are essential for tracking the spread and evolution of infectious diseases. They help identify the origin of outbreaks, monitor how pathogens evolve, and guide public health responses.
   Example: Global Spread of COVID-19
   During the COVID-19 pandemic, phylogenetic trees were used to:
   - **Trace Spread:** Show how SARS-CoV-2 traveled between countries. For example, variants found in the UK and          South Africa spread globally.
   - **Monitor Evolution:** Identify how and when new variants emerged. For instance, the Alpha variant emerged in        the UK, followed by Beta in South Africa and Delta in India.
   - **Inform Decisions:** Governments and health agencies used this data to implement travel restrictions, update        vaccines, and develop targeted interventions.


## How Are Phylogenetic Trees Created?
### To construct accurate phylogenetic trees, researchers rely on various data sources:

1. **Genetic Sequences:** The primary data used in phylogenetic analysis are DNA, RNA, or protein sequences obtained from different species or strains. These sequences are compared to identify similarities and differences.
2. **Public Databases:** Genetic sequence data can be accessed from public repositories like GenBank, EMBL, and DDBJ, which maintain comprehensive and annotated genetic information for numerous organisms.
3. **Genomic Projects:** Large-scale genomic projects, such as the Human Genome Project or the 1000 Genomes Project, provide extensive datasets that can be utilized for phylogenetic studies.
4. **Sequencing Technologies:** Advances in sequencing technologies, like next-generation sequencing (NGS), have made it easier and more cost-effective to obtain high-quality genetic data for a wide range of organisms.


## Types of Phylogenetic Trees and Their Applications
#### 1. **Rooted Trees:**
A rooted tree includes a single common ancestor, represented by the root, from which all other branches diverge. The direction of the branches shows the passage of time and evolutionary divergence.
Example: COVID-19 Variant Tracking.
Below is a Python visualization of a rooted tree comparing COVID-19 variants:


In [None]:
from Bio import Phylo
from io import StringIO
import matplotlib.pyplot as plt

# Hypothetical rooted tree for COVID-19 variants in Newick format
covid_rooted_tree = "((Alpha:0.2, Delta:0.3):0.5, (Omicron:0.4, Beta:0.6):0.3);"
tree = Phylo.read(StringIO(covid_rooted_tree), "newick")

# Visualization
fig, ax = plt.subplots(figsize=(10, 6))
Phylo.draw(tree, axes=ax)
ax.set_title("Rooted Tree: COVID-19 Variants Evolution", fontsize=14, weight='bold')
plt.show()

- **Interpretation**: This rooted tree helps trace how different variants evolved from a common ancestor.

In [None]:
from Bio import Phylo
from io import StringIO
import matplotlib.pyplot as plt

# Newick format representing a rooted tree for mammals
rooted_tree = "((Human:0.6, Chimpanzee:0.6):0.4, (Dog:0.8, (Cat:0.7, Mouse:0.7):0.3):0.2);"
tree = Phylo.read(StringIO(rooted_tree), "newick")

# Visualization
fig, ax = plt.subplots(figsize=(12, 8))
Phylo.draw(tree, axes=ax)
ax.set_title("Rooted Tree: Evolutionary Relationships Among Mammals", fontsize=14, weight='bold')
plt.show()


### **Exercise 1**
##### **Create a Rooted Tree using the following dataset:**
##### Dataset 1: Avian Species Evolution
##### The dataset is provided in Newick format:
##### ((Sparrow:0.3, Crow:0.4):0.2, (Eagle:0.5, Hawk:0.6):0.3);
Note: The Newick format will be discussed in Submodule 3.

In [None]:
from IPython.display import IFrame
IFrame("Exercises/answers1.html", width=800, height=350)

#### 2. **Unrooted Tree**

Unrooted trees do not indicate a common ancestor. They depict relationships between species without implying       evolutionary direction or time.
Example: Genetic Similarity Among COVID-19 Strains.

In [None]:
covid_unrooted_tree = "(Alpha, Delta, Omicron, Beta);"
tree = Phylo.read(StringIO(covid_unrooted_tree), "newick")

# Visualization
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(1, 1, 1)
Phylo.draw(tree, axes=ax)
ax.set_title("Unrooted Tree: COVID-19 Variants Relationship", fontsize=14, weight='bold')
plt.show()

- **When to Use**: Unrooted trees are helpful when ancestry is unknown or not relevant.

In [None]:
from Bio import Phylo
from io import StringIO
import matplotlib.pyplot as plt

# Newick format for an unrooted tree comparing microbial communities
unrooted_tree = "(Bacteria, Archaea, Eukaryota);"
tree = Phylo.read(StringIO(unrooted_tree), "newick")

# Visualization
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(1, 1, 1)
Phylo.draw(tree, axes=ax)
ax.set_title("Unrooted Tree: Microbial Communities", fontsize=14, weight='bold')
plt.show()

### Exercise 2
### **Interactive Question: Rooting an Unrooted Tree**
##### Given the following unrooted tree for Microbial Communities:
###### ***(Bacteria, Archaea, Eukaryota);***
##### Tasks:
- Visualize the unrooted tree using Biopython.
- Root the tree again, but this time use "Archaea" as the outgroup.
- Compare the two rooted trees:
- How do the relationships between Bacteria, Archaea, and Eukaryota change?
- Which rooting method provides a more meaningful representation, and why?



In [None]:
from IPython.display import IFrame
IFrame("Exercises/answers2.html", width=800, height=350)

#### 3. **Cladograms:**

Cladograms focus on branching order, showing relationships between species but not providing information about branch lengths or evolutionary distances.
Example: Evolutionary Relationship of Primates
   

In [None]:
cladogram = "(((Human, Chimpanzee), Gorilla), Orangutan);"
tree = Phylo.read(StringIO(cladogram), "newick")

# Visualization
fig, ax = plt.subplots(figsize=(10, 6))
Phylo.draw(tree, axes=ax)
ax.set_title("Cladogram: Evolution of Primates", fontsize=14, weight='bold')
plt.show()

- **Why It Matters**: Cladograms focus on relationships rather than evolutionary time.



In [None]:
from Bio import Phylo
from io import StringIO
import matplotlib.pyplot as plt

# Newick format for a cladogram (branching order only)
cladogram = "(((Frog, Lizard), (Bird, Mammal)), Fish);"
tree = Phylo.read(StringIO(cladogram), "newick")

# Visualization
fig, ax = plt.subplots(figsize=(10, 8))
Phylo.draw(tree, axes=ax)
ax.set_title("Cladogram: Vertebrate Evolution", fontsize=14, weight='bold')
plt.show()

### Exercise 3
#### Using the following Newick format:
(((Human, Chimpanzee), Gorilla), Orangutan);
Which species are most closely related in the cladogram?

In [None]:
from IPython.display import IFrame
IFrame("Exercises/answers3.html", width=800, height=350)

#### 4. **Phylograms**
Phylograms provide information on both branching order and branch lengths, which represent evolutionary changes.
Example: Genetic Divergence in Fruit Flies

In [None]:
phylogram = "((Drosophila_melanogaster:0.4, Drosophila_simulans:0.5):0.3, Drosophila_yakuba:0.6);"
tree = Phylo.read(StringIO(phylogram), "newick")

# Visualization
fig, ax = plt.subplots(figsize=(10, 6))
Phylo.draw(tree, axes=ax)
ax.set_title("Phylogram: Genetic Divergence in Fruit Flies", fontsize=14, weight='bold')
plt.show()

- **Why It Matters**: Useful for understanding evolutionary rates and genetic distances.



#### Exercises 4
##### Visualize the Phylogram: Use the provided Newick format to visualize the genetic divergence among fruit flies:
##### ((A:0.2,B:0.3):0.4,C:0.5);



In [None]:
from IPython.display import IFrame
IFrame("Exercises/answers4.html", width=800, height=350)

#### 5. **Dendrograms**:
    
Dendrograms include hierarchical clustering information and are useful in fields like genomics and linguistics.
Example: Gene Expression Analysis

In [None]:
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt
import numpy as np
from scipy.spatial.distance import squareform

# Sample data
genes = ['GeneA', 'GeneB', 'GeneC', 'GeneD', 'GeneE']
similarity_data = np.array([
    [0, 2, 4, 6, 8],
    [2, 0, 3, 5, 7],
    [4, 3, 0, 2, 4],
    [6, 5, 2, 0, 3],
    [8, 7, 4, 3, 0]
])

# Clustering
linkage_matrix = linkage(squareform(similarity_data), method='ward')

# Visualization
plt.figure(figsize=(10, 7))
dendrogram(linkage_matrix, labels=genes, leaf_rotation=45)
plt.title("Dendrogram: Gene Clustering", fontsize=14, weight='bold')
plt.xlabel("Genes")
plt.ylabel("Distance")
plt.show()

## Why Choose a Specific Tree Type?

1. **Rooted Trees**: Ideal for understanding evolutionary direction and common ancestry.

2. **Unrooted Trees**: Best for visualizing relationships when ancestral information is unavailable.

3. **Cladograms**: Focus on relationships without evolutionary time.

4. **Phylograms**: Combine evolutionary relationships with branch lengths for quantitative analysis.

5. **Dendrograms**: Use clustering for hierarchical relationships in data.

## Summary ##

Phylogenetic trees are powerful tools for visualizing evolutionary relationships and understanding genetic changes. In the context of COVID-19, they help trace variants, study mutations, and inform public health strategies.

By mastering the tools and concepts in this module, you will:

- Interpret and analyze biological data.

- Construct phylogenetic trees using Python.

- Solve real-world problems like tracking disease evolution.


  In next module we will study about "Collect and Prepare Sequence Data". It is the preprocessing step for constructing phylogenetic Tree.



# Interactive Quiz
The following quiz will help reinforce the understanding of phylogenetics:

In [None]:
from IPython.display import IFrame
IFrame("Quiz/QS1.html", width=800, height=350)