# 🧬 Pairwise Sequence Alignment in Biopython

Pairwise sequence alignment compares two biological sequences (DNA, RNA, or protein) to identify regions of similarity. These similarities can provide insights into functional, structural, or evolutionary relationships.

In this notebook, you'll:

- Learn the types of alignments (global vs local)
- Understand scoring systems (match, mismatch, gap penalties)
- Use Biopython’s modern `PairwiseAligner` class to perform alignments
- Visualize alignments in a human-readable format

## 🔄 Types of Pairwise Alignments

There are two main types of pairwise alignments:

- **Global alignment**: Aligns sequences across their entire length. Best when sequences are of similar size and expected to be homologous.
  - Example: Comparing two versions of a gene from different species.

- **Local alignment**: Aligns the most similar subsequence regions. Best for identifying conserved motifs or domains in divergent sequences.
  - Example: Finding a conserved transcription factor binding site.

## 🧮 Alignment Scoring System

Alignment accuracy depends on the scoring system you choose. You can customize:

- **Match score**: Reward for identical bases (e.g., +2)
- **Mismatch penalty**: Penalty for substitutions (e.g., -1)
- **Gap penalties**:
  - **Gap open penalty**: Cost to start a new gap
  - **Gap extension penalty**: Cost to extend an existing gap

These values affect how aggressive the alignment is about introducing gaps or mismatches.

## ⚙️ Biopython `PairwiseAligner`

As of Biopython v1.78+, the recommended way to do pairwise alignments is to use:

```python
from Bio import Align
```

This new aligner replaces the older `pairwise2` module and offers:
- Support for both global and local alignments
- Flexible scoring configuration
- Better performance and future compatibility


# 🔬 Examples using Biopython's `PairwiseAligner`


## 📦 Setup and Import

In [1]:
from Bio import Align
aligner = Align.PairwiseAligner()

## 1️⃣ Global Alignment (Match = 1, Mismatch = 0, No Gap Penalties)
This example mimics the old `globalxx` method: only matching characters contribute to the score.

In [29]:
aligner.mode = 'global'
aligner.match_score = 1
aligner.mismatch_score = 0
aligner.open_gap_score = 0
aligner.extend_gap_score = 0

alignments = aligner.align("AAGGTT", "AGTTAG")
print(alignments[0])
print("score:", alignments.score)



## 2️⃣ Global Alignment (Match = 2, Mismatch = -1)
Here we assign a penalty for mismatches while rewarding matches more heavily.

In [31]:
aligner.mode = 'global'
aligner.match_score = 2
aligner.mismatch_score = -1
aligner.open_gap_score = 0
aligner.extend_gap_score = 0

alignments = aligner.align("AAGGTT", "AGTTAG")
print(alignments[0])
print("score:", alignments.score)



## 3️⃣ Global Alignment with Gap Penalties (Open = -2, Extend = -1)
Gap opening and extension penalties discourage unnecessary gaps.

In [33]:
aligner.mode = 'global'
aligner.match_score = 1
aligner.mismatch_score = -1
aligner.open_gap_score = -2
aligner.extend_gap_score = -1

alignments = aligner.align("AAGGTT", "AGTTAG")
print(alignments[0])
print("score:", alignments.score)



## 4️⃣ Protein Alignment with BLOSUM62 Matrix
Use a substitution matrix to align protein sequences with biologically meaningful scores.

In [15]:
from Bio.Align import substitution_matrices

matrix = substitution_matrices.load("BLOSUM62")
aligner.substitution_matrix = matrix
aligner.open_gap_score = -10
aligner.extend_gap_score = -0.5

alignments = aligner.align("KEVLA", "EVL")
print(alignments[0])
print("score:", alignments.score)



## 5️⃣ Local Alignment (Match = 2, Mismatch = -1, Gaps Open = -2, Extend = -1)
Local alignment finds the best matching sub-region between two sequences.

In [32]:
aligner.mode = 'local'
aligner.match_score = 2
aligner.mismatch_score = -1
aligner.open_gap_score = -2
aligner.extend_gap_score = -1

alignments = aligner.align("AAGGTT", "AGTTAG")
print(alignments[0])
print("score:", alignments.score)



# 🔁 Different Alignments from Scoring Schemes

This example shows how changing scoring parameters in Biopython's `PairwiseAligner` can lead to different alignments, even with the same two sequences.

## 🧬 Sequences to Align
We will align:
- `AAGGTT`
- `AGTTAG`

These sequences contain overlaps and differences that will respond differently to scoring schemes.

In [34]:
from Bio import Align
aligner = Align.PairwiseAligner()
seq1 = "AAGGTT"
seq2 = "AGTTAG"

## ⚖️ Example 1: Match = 2, Mismatch = -1, No Gap Penalty
This scoring encourages matching without penalizing gaps.

In [35]:
aligner.mode = 'global'
aligner.match_score = 2
aligner.mismatch_score = -1
aligner.open_gap_score = 0
aligner.extend_gap_score = 0

alignments = aligner.align(seq1, seq2)
print(alignments[0])
print("score:", alignments.score)



## ❌ Example 2: Penalize Gaps Heavily
Now we discourage gaps strongly, which can shift the alignment strategy.

In [36]:
aligner.mode = 'global'
aligner.match_score = 1
aligner.mismatch_score = -1
aligner.open_gap_score = -5
aligner.extend_gap_score = -2

alignments = aligner.align(seq1, seq2)
print(alignments[0])
print("score:", alignments.score)



## 🔍 Observation
Notice how the alignment changes:
- In **Example 1**, the aligner may introduce gaps to maximize matching bases.
- In **Example 2**, the aligner avoids gaps, possibly accepting mismatches instead.

This illustrates how scoring influences biological interpretations!