# Day 4 Part 1
---

## Multiple sequence aligners + Phylogenetics

For more information about the theory behind of multiple sequence alignment, please see these resources:

- https://academic.oup.com/bib/article/17/6/1009/2606431
- https://www.hindawi.com/journals/isrn/2013/615630/

Note: There is a user-friend tool called MEGA that is likely sufficient for analyzing closely related species (within a phylum). However, it lacks some of the more sophisticated phylogenetic methods https://www.megasoftware.net/

### Aligners

Here is a list of recommended aligners (non-exhaustive). Unfortunately, most do not have Windows packages, however some of them do have webservers.  
- [MAFFT](https://mafft.cbrc.jp/alignment/software/) 
- [T-COFFEE](http://www.tcoffee.org/Projects/tcoffee/workshops/tcoffeetutorials/installation.html)
- [MUSCLE](https://www.drive5.com/muscle/) -- Has a windows distribution!
- [HMMER](http://hmmer.org/) *great for aligning distantly related proteins*
  - Profile alignment, guides the alignment of your sequences using a well curated template file. 

### Alignment viewers

The most popular viewer on the market is [Jalview](https://www.jalview.org/), but there are older alternatives such as [SeaView](http://doua.prabi.fr/software/seaview).

### Masking alignments

If you want to proceed to phylogenetic analysis you can treat your alignment with a 'masking' method. This is a way to 'mask' or 'hide' regions of the alignment that might be misaligned (and therefore non-homologous). Historically, this was done by eye (yep..) but now we have tools to help us out! My favourites are: 

- [BMGE](https://bmcecolevol.biomedcentral.com/articles/10.1186/1471-2148-10-210)
- [trimal](http://trimal.cgenomics.org/) -- has a windows distribution
- [divvier](https://github.com/simonwhelan/Divvier) # advanced user

### Phylogenetic Analyses

Finally, for phylogenetic analyses, I recommend Maximum-likelihood or Bayesian methods:

- [IQTREE](http://www.iqtree.org/)
- [RAxML](https://cme.h-its.org/exelixis/web/software/raxml/)
- [PhyloBayes](http://www.atgc-montpellier.fr/phylobayes/)

### TreeViewers

- [FIGTREE](http://tree.bio.ed.ac.uk/software/figtree/)
- [iTOL](https://itol.embl.de/)



Since muscle has a windows distribution let's try and get that working.  Mac/unix/colab users head to the next section

### Windows

Window's folks head to this link and download the executable (tar.gz again): 

https://www.drive5.com/muscle/downloads.htm

http://trimal.cgenomics.org/downloads

http://www.iqtree.org/#download


Note the path where you have downlodaded these including the \bin\


In [None]:
# Set the path variable for each package
!set PATH=%PATH%;C:\your\path\here\
!muscle

In [1]:
### MAC/UNIX/COLLAB

!conda install --yes -c bioconda muscle
!conda install -c bioconda --yes trimal
!conda install --yes -c bioconda iqtree


Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/julieb/anaconda3

  added / updated specs:
    - muscle


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-4.14.0               |   py39hf3d152e_0        1011 KB  conda-forge
    muscle-3.8.1551            |       h7d875b9_6         262 KB  bioconda
    ------------------------------------------------------------
                                           Total:         1.2 MB

The following NEW packages will be INSTALLED:

  muscle             bioconda/linux-64::muscle-3.8.1551-h7d875b9_6

The following packages will be UPDATED:

  conda                               4.13.0-py39hf3d152e_1 --> 4.14.0-py39hf3d152e_0



Downloading and Extracting Packages
conda-4.14.0         | 1011 KB   | ##################################### | 100% 
muscle-3.8.1551  

In [6]:
!muscle 


MUSCLE v3.8.1551 by Robert C. Edgar

http://www.drive5.com/muscle
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.


Basic usage

    muscle -in <inputfile> -out <outputfile>

Common options (for a complete list please see the User Guide):

    -in <inputfile>    Input file in FASTA format (default stdin)
    -out <outputfile>  Output alignment in FASTA format (default stdout)
    -diags             Find diagonals (faster for similar sequences)
    -maxiters <n>      Maximum number of iterations (integer, default 16)
    -maxhours <h>      Maximum time to iterate in hours (default no limit)
    -html              Write output in HTML format (default FASTA)
    -msf               Write output in GCG MSF format (default FASTA)
    -clw               Write output in CLUSTALW format (default FASTA)
    -clwstrict         As -clw, with 'CLUSTAL W (1.81)' header
    -log[a] <logfile>  Log to file (append if -loga, overwrite if -log)
 

In [None]:
## Run the aligner using mafft
## It will save to the file SDHA_ncbi.mus.fasta 

!muscle -quiet -in SDHA_ncbi.fasta -out SDHA_ncbi.mus.fasta 

## Run the 'masking' or 'trimming' tool

!trimal -gappyout -in SDHA_ncbi.mus.fasta  -out SDHA_ncbi.mus.go.fasta 


In [None]:
## Run the tree making tool 

!iqtree -s SDHA_ncbi.mus.go.fasta --mset LG,WAG --mrate G,I -bb 1000 -pre SDHA_ncbi.mus.go.mfp -quiet
