# Step 1: Setup in Google Colab

In [2]:
## First, ensure that you have the required packages installed.
## In Google Colab, you can install the necessary bioinformatics tools and libraries using the following code:

!apt-get install -y muscle

## This command installs MUSCLE (Multiple Sequence Comparison by Log-Expectation),
## a tool for creating multiple sequence alignments of nucleotide or protein sequences.
## In this analysis, MUSCLE aligns the Exo70 sequences, which is a crucial step before
## constructing a phylogenetic tree. The alignment ensures that the sequences are organized
## to highlight conserved regions, helping reveal evolutionary relationships.


## !: This symbol runs a shell command directly from the Jupyter notebook.
## It allows us to execute terminal commands, such as installing software, directly within the notebook environment.

## apt-get: This is a package management command in Debian-based Linux systems.
## It helps install, update, or remove software packages from repositories.


## install: Tells apt-get that we want to install a package.
## -y: This option automatically answers "yes" to any prompts during installation, so the installation proceeds without manual confirmation.


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
muscle is already the newest version (1:3.8.1551-2build1).
0 upgraded, 0 newly installed, 0 to remove and 49 not upgraded.


In [3]:
## Google Colab doesn’t always have RAxML in its default repositories, so we need to either
## compile it from source or use a precompiled version if available. Here’s a workaround to
## install RAxML on Google Colab by downloading and compiling it directly:


## Download RAxML Source Code:
!wget https://github.com/stamatak/standard-RAxML/archive/refs/heads/master.zip -O raxml.zip
!unzip raxml.zip

## Compile RAxML:
%cd standard-RAxML-master
!make -f Makefile.gcc
%cd ..

## Move the Compiled RAxML to a Usable Location:
## This places the raxmlHPC executable in your path, so you can call it directly.

!cp standard-RAxML-master/raxmlHPC .




--2024-11-06 21:49:43--  https://github.com/stamatak/standard-RAxML/archive/refs/heads/master.zip
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/stamatak/standard-RAxML/zip/refs/heads/master [following]
--2024-11-06 21:49:44--  https://codeload.github.com/stamatak/standard-RAxML/zip/refs/heads/master
Resolving codeload.github.com (codeload.github.com)... 140.82.114.10
Connecting to codeload.github.com (codeload.github.com)|140.82.114.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘raxml.zip’

raxml.zip               [       <=>          ]   9.75M  6.94MB/s    in 1.4s    

2024-11-06 21:49:46 (6.94 MB/s) - ‘raxml.zip’ saved [10226862]

Archive:  raxml.zip
8ee5879027ce766a89283ecc0da449ae2d64c540
   creating: standard-RAxML-master/
   creating: standard-RAxML-

In [4]:
!pip install biopython

## Biopython is a Python library for computational biology, providing tools to read,
## write, and manipulate biological data files (like FASTA and alignment files).
## In this analysis, we will use Biopython to load your Exo70 sequences, manage sequence data,
## and interact with the output files from MUSCLE and RAxML. It’s especially helpful for handling sequence data within Python.


!pip install ete3

## ETE3 (Environment for Tree Exploration) is a Python library for constructing, visualizing, and manipulating phylogenetic trees.
## After generating the tree with RAxML, we will use ETE3 to load and visualize the tree directly within Google Colab.
## This visualization will help interpret the phylogenetic relationships among the Exo70 sequences in a clear, graphical format.



# Step 2: Load and Organize Your Sequences

In [5]:
## Upload your Exo70 sequences file to Colab.
## Replace 'Exo70_sequences.fasta' with the path to your sequences file.

from Bio import SeqIO
from google.colab import files

# Upload your input FASTA files
uploaded = files.upload()

Saving Exo70_HvTaOsOtZmSiSbBd_5_SH.fa to Exo70_HvTaOsOtZmSiSbBd_5_SH.fa


In [6]:

# Load sequences
sequences = list(SeqIO.parse('Exo70_HvTaOsOtZmSiSbBd_5_SH.fa', 'fasta'))
print(f"Loaded {len(sequences)} sequences.")

Loaded 386 sequences.


# Step 3: Perform Multiple Sequence Alignment

In [7]:
## Use MUSCLE to perform sequence alignment. This creates a file Exo70_aligned.aln containing the alignment.

!muscle -in Exo70_HvTaOsOtZmSiSbBd_5_SH.fa -out Exo70_aligned.aln



MUSCLE v3.8.1551 by Robert C. Edgar

http://www.drive5.com/muscle
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.

Exo70_HvTaOsOtZmSiSbBd_5_SH 386 seqs, lengths min 76, max 1085, avg 548
00:00:01     18 MB(4%)  Iter   1  100.00%  K-mer dist pass 1
00:00:01     18 MB(4%)  Iter   1  100.00%  K-mer dist pass 2
00:00:06   177 MB(44%)  Iter   1  100.00%  Align node
00:00:06   178 MB(44%)  Iter   1  100.00%  Root alignment
00:00:11   179 MB(44%)  Iter   2  100.00%  Refine tree
00:00:11   179 MB(44%)  Iter   2  100.00%  Root alignment
00:00:11   179 MB(44%)  Iter   2  100.00%  Root alignment
00:02:11   179 MB(44%)  Iter   3  100.00%  Refine biparts
00:04:02   179 MB(44%)  Iter   4  100.00%  Refine biparts
00:05:50   179 MB(44%)  Iter   5  100.00%  Refine biparts
00:06:52   179 MB(44%)  Iter   6  100.00%  Refine biparts
00:06:52   179 MB(44%)  Iter   6  100.00%  Refine biparts


# Step 4: Phylogenetic Tree Construction with RAxML

In [9]:
## Using the aligned sequences, construct a phylogenetic tree with RAxML.
## Here’s the command to run RAxML, specifying the JTT model and 100 bootstraps for accuracy.

!./raxmlHPC -f a -x 12345 -p 12345 -# 100 -m PROTGAMMAJTT -s Exo70_aligned.aln -n Exo70_tree -T 16



Option -T does not have any effect with the sequential or parallel MPI version.
It is used to specify the number of threads for the Pthreads-based parallelization

RAxML can't, parse the alignment file as phylip file 
it will now try to parse it as FASTA file



This is RAxML version 8.2.12 released by Alexandros Stamatakis on May 2018.

With greatly appreciated code contributions by:
Andre Aberer      (HITS)
Simon Berger      (HITS)
Alexey Kozlov     (HITS)
Kassian Kobert    (HITS)
David Dao         (KIT and HITS)
Sarah Lutteropp   (KIT and HITS)
Nick Pattengale   (Sandia)
Wayne Pfeiffer    (SDSC)
Akifumi S. Tanabe (NRIFS)
Charlie Taylor    (UF)


Alignment has 1512 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this alignment: 76.76%

RAxML rapid bootstrapping and subsequent ML search

Using 1 distinct models/data partitions with joint branch length optimization



Executing 100 rapid bootstrap inferences and thereafter a thorough ML search 

In [None]:
## After running, RAxML will output a file RAxML_bipartitions.Exo70_tree containing the tree with bootstrap values.

## Step 5: Visualize the Phylogenetic Tree
## Use the ete3 package in Python to visualize the phylogenetic tree.


from ete3 import Tree, TreeStyle

# Load and visualize the tree
tree = Tree("RAxML_bipartitions.Exo70_tree")

# Optional: Customize TreeStyle for better visualization
ts = TreeStyle()
ts.show_leaf_name = True
ts.scale = 120

tree.show(tree_style=ts)


In [None]:
## Step 8: Download Results

## You can download your results, including alignments, trees, or figures, directly from Google Colab.

from google.colab import files
files.download('Exo70_aligned.aln')
files.download('RAxML_bipartitions.Exo70_tree')
