In [None]:
# Application of Today's Learning

## PAM Matrices in R

To apply today's learning, let's use the PAM100 matrix and perform a sequence alignment in R:

 
# Load PAM100 matrix
pam100 <- read.table(system.file("matrices/pam/pam100", package = "seqinr"), as.is = TRUE)

# Print PAM100 matrix
print(pam100)

# Install and load necessary packages
install.packages("Biostrings")
library(Biostrings)

# Define sequences
seq1 <- "HEAGAWGHEE"
seq2 <- "PAWHEAE"

# Perform pairwise alignment using BLOSUM62 matrix (default for BLAST)
alignment_blosum <- pairwiseAlignment(pattern = seq1, subject = seq2, substitutionMatrix = BLOSUM62)

# Print the BLOSUM62 alignment
print(alignment_blosum)



## BLOSUM62 Weighting
The BLOSUM62 matrix is designed to handle diverse protein sequences. It weights sequences based on a 62% identity
threshold, merging sequences with higher identity into one. This is particularly useful for scoring proteins that
share less than 62% identity, as it is more heavily weighted by proteins with less than 62% identity.


# Summary of Henikoffsâ€™ Paper

Henikoffs' paper provided insights into the performance and characteristics of BLOSUM matrices compared to PAM matrices in sequence alignments. Here's a summary:

## Key Findings:

1. **BLOSUM Outperformed PAM:**
   - BLOSUM matrices demonstrated a significant improvement in performance compared to PAM matrices.

2. **Usefulness in Identifying Weak Alignments:**
   - BLOSUM matrices were particularly effective in identifying weakly scoring alignments.

3. **Optimal Performance of BLOSUM62:**
   - Among BLOSUM matrices, BLOSUM62 showed slightly better performance than BLOSUM60 or BLOSUM70.

4. **Commonly Used Scoring Matrices:**
   - BLOSUM50 and BLOSUM90 are also commonly employed scoring matrices in BLAST searches.
   - The FASTA family of sequence comparison programs defaults to using BLOSUM50.

5. **Evolutionary Models:**
   - PAM matrices follow an explicit evolutionary model, counting replacements on the branches of a phylogenetic tree.
   - In contrast, BLOSUM matrices do not rely on a phylogenetic tree.

6. **Alignment Scope:**
   - PAM matrices are global alignment matrices, encompassing both highly conserved and highly mutable regions.
   - BLOSUM matrices focus only on highly conserved regions, prohibiting the inclusion of gaps in series of alignments.

7. **Relatedness Context:**
   - BLOSUM matrices consider relatedness within the specific group of sequences being analyzed.

8. **Interpretation of Matrix Numbers:**
   - PAM matrices use higher numbers to denote larger evolutionary distances.
   - BLOSUM matrices, on the other hand, assign higher numbers to indicate higher sequence similarity and, therefore, smaller evolutionary distances.

## Salient Differences between PAM and BLOSUM:
- BLOSUM matrices outperform PAM matrices.
- BLOSUM62 is particularly effective.
- BLOSUM matrices are commonly used in BLAST searches.
- PAM matrices follow an explicit evolutionary model with global alignment scope.
- BLOSUM matrices focus on highly conserved regions and vary contextually.
- The interpretation of matrix numbers differs between PAM and BLOSUM.

*Henikoffs' paper sheds light on the nuances and performance disparities between PAM and BLOSUM matrices, aiding researchers in choosing appropriate scoring matrices for their bioinformatics analyses.*
