# Pairwise alignment 

Used to identify regions of similarity that may indicate functional, structural, and evolutionary relationship between two biological sequences (protein or nucleic acid)

- identifying these similaries can be used to infer conserved sequences between species and genetic similarity or evolutionary divergence

Pairwise alignment uses dynamic programming to find the optimal alignment between two sequences; scoring the similarity or distance then assessing the significance of this score 

__Types of pairwise alignment__
1. _Global Alignment_: this finds the similarity of two sequences across the entire length of each sequence 
2. _Local Alignment_: this find the most similar subsequences among the two sequences

__When doing alignments, one can specify ...__
- Match Score: indicates the compatibility of two characters in the sequences. Highly compatible gives positive score and dissimilar gives negativ score 
- Gap penalties should give negative scores 


## Bio.pairwase2

Biopython includes two pairwise aligners..

1. `Bio.pairwise2` module 
2. PairwiseAligner class within the `Bio.Align` module (since Biopython v1.72)

Both include global and local alignment -> focus on pairwise2

The name of the alignment function for pairwise2 in this module follow the conventions `alignmenttypeXY` where `alignmenttype` is either "global" or "local". `XY` is a two character indicating the parameter it takes. 
- `X`: the parameters for matches
- `Y`: indicates the parameters for gap penalties 

### Match Parameters

1. `x`: no parameters, identical characters score 1, otherwise 0
2. `m`: a match score is the score of identical chars, otherwise mismatch score. __Keywords: match, mismatch__
3. `d`: a dictionary returns the score of any pair of characters __Keyword: match_dict__
4. `c`: a callback function returns scores __Keyword: match_fn__

### Gap penalty parameters

1. `x`: No gap penalties
2. `s`: same open and extend gap penalties for both sequences __Keywords: open, extend__
3. `d`: sequences have different open and extend gap penalties __Keywords: openA, extendA, openB, extendB__
4. `c`: a callback function returns the gap penalties __Keywords: gap_A_fn, gap_B_fn__

In [1]:
# Examples of global alignments 
from Bio import pairwise2

# globalxx: matches score 1, mismatches 0 and no gap penalties
alignments = pairwise2.align.globalxx("ACCGGT", "ACGT") 
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))


ACCGGT
| | ||
A-C-GT
  Score=4

ACCGGT
||  ||
AC--GT
  Score=4

ACCGGT
| || |
A-CG-T
  Score=4

ACCGGT
|| | |
AC-G-T
  Score=4





In [2]:
# globalmx -- num matches score 2, mimatches -1. No gap penalty
alignments = pairwise2.align.globalmx("ACCGGT", "ACGT", match=2, mismatch=-1)
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

ACCGGT
| | ||
A-C-GT
  Score=8

ACCGGT
||  ||
AC--GT
  Score=8

ACCGGT
| || |
A-CG-T
  Score=8

ACCGGT
|| | |
AC-G-T
  Score=8



In [3]:
#globalxs -- matches score 1, mismatches 0, opening gap -2, extended gap -1
alignments = pairwise2.align.globalxs ("ACCGGT", "ACGT", open=-2, extend=-1)
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

ACCGGT
||  ||
AC--GT
  Score=1



In [4]:
# globaldx -- matching/mismatching scores read from blosum2 matrix, no gap penalty
from Bio.Align import substitution_matrices
matrix = substitution_matrices.load("BLOSUM62") # blosum62 scoring matric for seq alignment
alignments = pairwise2.align.globaldx("KEVLA", "EVL", match_dict=matrix)
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

KEVLA
 ||| 
-EVL-
  Score=13



In [6]:
# globalmc -- matches score 5, mismatches -4, gap penalty defined thru gap_function
from math import log 
def gap_function(x, y): # x is gap position in seq, y is gap length
    if y == 0: # no gap
        return 0
    elif y == 1: # gap open penalty
        return -2
    return - (2 + y/4.0 + log(y)/2.0)

alignments = pairwise2.align.globalmc("ACCCCCGT","ACG", match=5, mismatch=-4,
                                      gap_A_fn=gap_function, gap_B_fn=gap_function)
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

ACCCCCGT
|    || 
A----CG-
  Score=9.30685

ACCCCCGT
||    | 
AC----G-
  Score=9.30685

