# Global Alignment 
Pairwise Sequence Alignment using Needleman-Wunsch Algorithm

oleh Candra Dewi Jodistiara

## Pendahuluan

## Tahapan dan Implementasi Program

### Inisialisasi Score dan Kode DNA/Asam Amino

<style>
.text_cell_render {
font-family: Ubuntu, serif;
}
</style>
    Langkah awal yang perlu dilakukan yaitu menyimpan kedua kode asam amino ke dalam string, serta melakukan mengimport matrix yang digunakan untuk scoring ke dalam numpy 2d array. Dalam hal ini, algoritma Needleman-Wunsch menggunakan scoring BLOSUM62. 
    Kemudian, kode asam amino disimpan dalam bentuk python list agar dapat dilakukan penghitungan menggunakan iterasi per karakter.

In [18]:
import numpy as np
import pandas as pd

cols = ['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', \
        'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V', 'B', 'Z', 'X', '*']
blosum = pd.read_csv('blosum62.txt', delim_whitespace=True, header=0, index_col=0)

blosum

Unnamed: 0,A,R,N,D,C,Q,E,G,H,I,...,P,S,T,W,Y,V,B,Z,X,*
A,4,-1,-2,-2,0,-1,-1,0,-2,-1,...,-1,1,0,-3,-2,0,-2,-1,0,-4
R,-1,5,0,-2,-3,1,0,-2,0,-3,...,-2,-1,-1,-3,-2,-3,-1,0,-1,-4
N,-2,0,6,1,-3,0,0,0,1,-3,...,-2,1,0,-4,-2,-3,3,0,-1,-4
D,-2,-2,1,6,-3,0,2,-1,-1,-3,...,-1,0,-1,-4,-3,-3,4,1,-1,-4
C,0,-3,-3,-3,9,-3,-4,-3,-3,-1,...,-3,-1,-1,-2,-2,-1,-3,-3,-2,-4
Q,-1,1,0,0,-3,5,2,-2,0,-3,...,-1,0,-1,-2,-1,-2,0,3,-1,-4
E,-1,0,0,2,-4,2,5,-2,0,-3,...,-1,0,-1,-3,-2,-2,1,4,-1,-4
G,0,-2,0,-1,-3,-2,-2,6,-2,-4,...,-2,0,-2,-2,-3,-3,-1,-2,-1,-4
H,-2,0,1,-1,-3,0,0,-2,8,-3,...,-2,-1,-2,-2,2,-3,0,0,-1,-4
I,-1,-3,-3,-3,-1,-3,-3,-4,-3,4,...,-3,-2,-1,-3,-1,3,-3,-3,-1,-4


In [12]:
aa1 = "ATGC"
aa2 = "TGC"

gap = blosum['*']['*']

In [13]:
code1 = list("*") + list(aa1.upper())
code2 = list("*") + list(aa2.upper())

print(code1)
print(code2)

['*', 'A', 'T', 'G', 'C']
['*', 'T', 'G', 'C']


### 1. Membuat Scoring Matrix

<style>
.text_cell_render {
font-family: Ubuntu, serif;
}
</style>
    Berikutnya adalah membangun scoring matrix. 

In [14]:
scores = [[0 for i in range(len(code2))] for j in range(len(code1))]

# mengisi kolom pertama dan baris pertama
for i in range(len(code1)):
    scores[i][0] = gap * i
    
for j in range(len(code2)):
    scores[0][j] = gap * j

# mengisi sisanya
for i in range(1, len(code1)):
    for j in range(len(code2)):
        match = scores[i-1][j-1] + blosum[code1[i]][code2[j]]
        delete = scores[i-1][j] + gap
        insert = scores[i][j-1] + gap

        scores[i][j] = max(match, delete, insert)
        
print("Scoring Matrix: \n")
for score in scores:
    print(score)

Scoring Matrix: 

[0, 1, 2, 3]
[1, 2, 3, 4]
[2, 6, 7, 8]
[4, 7, 12, 13]
[9, 10, 13, 21]


### 2. Traceback nilai tertinggi

In [5]:
max_value = max(map(max, scores))
max_index = [x for x in scores if max_value in x][0]
max_index = [scores.index(max_index),max_index.index(max_value)]
print("\nMAX VALUE \nvalue: ", max_value, "\nindex: ", max_index)


MAX VALUE 
value:  21 
index:  [4, 3]


### 3. Alignment Asam Amino

In [6]:
AlignmentA = ""
AlignmentB = ""

i = len(code1)-1
j = len(code2)-1

while (i > 0 or j > 0):
    if (i > 0) and (j > 0) and (scores[i][j] == scores[i-1][j-1] + blosum[code1[i]][code2[j]]):
        AlignmentA = code1[i] + AlignmentA
        AlignmentB = code2[j] + AlignmentB
        i = i - 1
        j = j - 1
    elif (i > 0) and (scores[i][j] == scores[i-1][j] + gap):
        AlignmentA = code1[i] + AlignmentA
        AlignmentB = "-" + AlignmentB
        i = i - 1
    elif (j > 0) and (scores[i][j] == scores[i][j-1] + gap):
        AlignmentA = "-" + AlignmentA
        AlignmentB = code2[j] + AlignmentB
        j = j - 1


In [7]:
print("Amino Acid 1: ", AlignmentA)
print("Amino Acid 2: ", AlignmentB)

Amino Acid 1:  ATGC
Amino Acid 2:  -TGC


In [17]:
gap

1