# Profile HMMs for Classifying Proteins

Author: Nhan TV <br>
Master Student, HUS - VNU

### Đặt Vấn Đề

Cho một họ protein có chức năng tương tự nhau, ta cần xác định một protein mới có thuộc họ protein đó hay không.Trước tiên, ta cần căn dòng trình tự các protein trong họ protein, phương pháp căn dòng trình tự sẽ không được đề cập trong báo cáo này. Sau đó, ta sẽ sử dụng Hidden Markov Models để tính xác xuất lớn nhất giữa trình tự của protein mới với họ protein đã cho.

Nếu một trong những kết quả căn dòng trình tự của protein mới với họ protein đã cho có xác xuất cao hơn một ngưỡng được quy định thì ta kết luận rằng protein mới thuộc họ protein đã cho.

In [1]:
import numpy as np
import pandas as pd
import string

### 1 Tiền xử lý dữ liệu

#### 1.1 Nhập Dữ Liệu

Dữ liệu mô phỏng một họ gồm 5 protein đã được căn dòng trình tự.

In [293]:
alignment = []
with open('D:/Data Science/Coursera/Bioinformatics/Find mutation of DNA_RNA_Protein/hmm_7.txt') as f:
  while True:
    s = []
    c = f.readline().replace("\n", "")
    for x in c:
        s.append(x)
    if not c:
      break 
    alignment.append(s)
alignment = np.array(alignment)

In [294]:
alignment

array([['A', 'C', 'D', 'E', 'F', 'A', 'C', 'A', 'D', 'F'],
       ['A', 'F', 'D', 'A', '-', '-', '-', 'C', 'C', 'F'],
       ['A', '-', '-', 'E', 'F', 'D', '-', 'F', 'D', 'C'],
       ['A', 'C', 'A', 'E', 'F', '-', '-', 'A', '-', 'C'],
       ['A', 'D', 'D', 'E', 'F', 'A', 'A', 'A', 'D', 'F']], dtype='<U1')

In [313]:
# Chiều dữ liệu
N, D = alignment.shape
# Ngưỡng theta
theta = 0.4

#### 1.2 Đưa dữ liệu ban đầu vào các cột trạng thái

In [314]:
# Bảng chữ ái alphabet
alphabet = list(string.ascii_uppercase)

In [284]:
# Xác định các trạng tháng match, insert và delete
def get_alignment_seed(alignment, theta):
    ignore = []
    index = [i for i in range(D)]
    for i in range(D):
        # Tỷ lệ delete tại các cột
        apear = alignment[:, i].tolist().count("-")
        prop = apear/N
        # So sánh tỷ lệ delete với ngưỡng theta
        if prop > theta or prop == theta:
            ignore.append(i) 
    true = list(set(index).difference(set(ignore)))
    return(alignment[:, true], alignment[:, ignore], true, ignore)

In [722]:
# Số các trạng thái match
N_seed = get_alignment_seed(alignment, theta)[0].shape[1]
N_seed

8

In [1065]:
# Các trạng thái bị loại bỏ 
ignore = get_alignment_seed(alignment, theta)[1]

In [1066]:
ignore

array([['A', 'C'],
       ['-', '-'],
       ['D', '-'],
       ['-', '-'],
       ['A', 'A']], dtype='<U1')

In [285]:
"""Đưa các cột của ma trận đã căn dòng trình tự vào các cột trạng thái match, insert, delete"""
def get_states_alignment(alignment, theta):
    
    alignment_seed = get_alignment_seed(alignment, theta)[0] # Mat trận match
    alignment_ignore = get_alignment_seed(alignment, theta)[1] # Ma trận loại bỏ
    true = get_alignment_seed(alignment, theta)[2] # Index cảu cột trạng thái match
    ignore = get_alignment_seed(alignment, theta)[3] # Index của cột bị loại bỏ
    st_al_df = pd.DataFrame({'S': np.zeros((N))})
    # Thêm cột bị loại bỏ ở vị trí đầu tiên (nếu có) vào trạng thái I0
    if ignore == []:
        st_al_df['I0'] = np.zeros((N))
    elif ignore != []:
        if ignore[0] == 0:
            st_al_df['I0'] = alignment_ignore[:, 0]
        elif ignore[0] != 0:
            st_al_df['I0'] = np.zeros((N))
    # Đưa các cột vào các trạng thái match và delete    
    k = alignment_seed.shape[1]
    for i in range(k):
        st_al_df['M' + str(i+1)] = alignment_seed[:, i]
        st_al_df['D' + str(i+1)] = np.zeros(N) 
        
        ind = []
        for index in ignore:
            if i < k-1: 
                if index > true[i] and index < true[i+1]:
                    ind.append(index)
            else:
                if index > true[i]:
                    ind.append(index)
        # Đưa các cột vào cột trạng thái insert
        if ind != []:
            col = []
            for j in range(N):
                x = alignment[j, ind]
                counting_alphabet = len(list(set(x).intersection(set(alphabet))))

                if counting_alphabet == len(x):
                    col.append("".join(x))
                elif counting_alphabet < len(x) and counting_alphabet > 0:
                    string = []
                    for y in x:
                        if y != "-":
                            string.append(y)
                    col.append("".join(string))
                elif counting_alphabet == 0:
                    col.append("-")

            st_al_df['I' + str(i+1)] = col
        else:
            st_al_df['I' + str(i+1)] = np.zeros((N))
         
    st_al_df["E"] = np.zeros((N))
    for i in range(1, alignment_seed.shape[1]+1):
        for j in range(N):
            if st_al_df['M' + str(i)][j] == "-":
                st_al_df['D' + str(i)][j] = "-"
    return(st_al_df)

In [316]:
states_alignment_df = get_states_alignment(alignment, theta)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [1067]:
# Dữ liệu sau khi được gán cào các cột trạng thái
states_alignment_df

Unnamed: 0,S,I0,M1,D1,I1,M2,D2,I2,M3,D3,...,M6,D6,I6,M7,D7,I7,M8,D8,I8,E
0,0.0,0.0,A,0.0,0.0,C,0,0.0,D,0,...,A,0.0,0.0,D,0,0.0,F,0.0,0.0,0.0
1,0.0,0.0,A,0.0,0.0,F,0,0.0,D,0,...,C,0.0,0.0,C,0,0.0,F,0.0,0.0,0.0
2,0.0,0.0,A,0.0,0.0,-,-,0.0,-,-,...,F,0.0,0.0,D,0,0.0,C,0.0,0.0,0.0
3,0.0,0.0,A,0.0,0.0,C,0,0.0,A,0,...,A,0.0,0.0,-,-,0.0,C,0.0,0.0,0.0
4,0.0,0.0,A,0.0,0.0,D,0,0.0,D,0,...,A,0.0,0.0,D,0,0.0,F,0.0,0.0,0.0


### 3 Xác Định Các Đối Tượng Của HMM

#### 3.1 Xác xuất sinh ra các ký tự từ các trạng thái

In [299]:
# Các ký tự có trong họ protein
sigma = ["A", "B", "C", "D", "E", "F"]

In [289]:
"""Tìm xác xuất sinh ra các ký tự từ các trạng thái"""
def get_emission(alignment, theta, sigma):
    state_alignment_df = get_states_alignment(alignment, theta)
    columns = state_alignment_df.columns.values
    prop = []
    for x in columns:
        total = 0
        for t in state_alignment_df[x]:
            if t != "-" and t != 0:
                total += len(t)
        
        prop_sub = []
        for z in sigma:
            c = 0
            for t in state_alignment_df[x]:
                if t != 0:
                    c += t.count(z)
            if total != 0:
                prop_sub.append(c/total)  
            else:
                prop_sub.append(0)
        prop.append(prop_sub)  
    emisson = pd.DataFrame(np.array(prop), index = columns, columns = sigma)
    return(emisson)

In [1079]:
emit_p= get_emission(alignment, theta, sigma)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [1080]:
emit_p

Unnamed: 0,A,B,C,D,E,F
S,0.0,0.0,0.0,0.0,0.0,0.0
I0,0.0,0.0,0.0,0.0,0.0,0.0
M1,1.0,0.0,0.0,0.0,0.0,0.0
D1,0.0,0.0,0.0,0.0,0.0,0.0
I1,0.0,0.0,0.0,0.0,0.0,0.0
M2,0.0,0.0,0.5,0.25,0.0,0.25
D2,0.0,0.0,0.0,0.0,0.0,0.0
I2,0.0,0.0,0.0,0.0,0.0,0.0
M3,0.25,0.0,0.0,0.75,0.0,0.0
D3,0.0,0.0,0.0,0.0,0.0,0.0


#### 3.2 Xác xuất chuyển trạng thái

In [416]:
"""Tìm ma trận chuyển trạng thái dựa trên đồ thị HMM"""
def trans_prop(alignment, theta):
    alignment_seed = get_alignment_seed(alignment, theta)[0]
    state_alignment_df = get_states_alignment(alignment, theta)
    cols = state_alignment_df.columns.values
    m = len(cols)
    trans_prop = pd.DataFrame(np.zeros((m, m)), index = cols, columns = cols)
    end = alignment_seed.shape[1]
    
    prop_im = []
    prop_id = []
    prop_md = []
    prop_mi = []
    prop_mm = []
    prop_di = []
    prop_dm = []
    prop_dd = []
    prop_ii = []    
    prop_ie = []
    prop_de = []
    prop_me = []
    
    count_si = 0
    count_sd = 0
    count_sm = 0
    count_ie = 0
    count_de = 0
    count_me = 0
    count_mie = 0
    
    denom_iee = 0
    denom_dee = N
    denom_mee = N
    # Xác xuất chuyển từ S đến I0, D1, M1
    for x, y, z in zip(state_alignment_df['I0'], state_alignment_df['D1'], state_alignment_df['M1']):
        if x != 0 and x != '-':
            count_si += 1
        if y == "-":
            count_sd += 1
        if z != "-":
            count_sm += 1
            
        if x != "-" and x != 0:
            counting_alphabet = len(x)
            trans_prop.loc['I0', 'I0'] = (counting_alphabet-1)/N
        else:
            trans_prop.loc['I0', 'I0'] = 0
    prop_si = count_si/N
    prop_sd = count_sd/N
    prop_sm = count_sm/N
    
    trans_prop.loc['S', 'I0'] = prop_si
    trans_prop.loc['S', 'M1'] = prop_sm
    trans_prop.loc['S', 'D1'] = prop_sd
    # Xác xuất chuyển trạng thái từ I, D, M đến E
    for x, y, z in zip(state_alignment_df['I'+str(end)],
                       state_alignment_df['D'+str(end)],
                       state_alignment_df['M'+str(end)]):
        if x != 0 and x != '-':
            count_ie += 1
        if y == "-":
            count_de += 1
        if z != "-" and x == "-" or x == 0:
            count_me += 1
        elif z != "-" and x != "-" and x!= 0:
            count_mie += 1
        if x != "-" and x!= 0:
            denom_iee += len(x)
        if y != "-":
            denom_dee -= 1
        if z == "-":
            denom_mee -= 1
    if denom_iee != 0:        
        prop_ie = count_ie/denom_iee
    else:
        prop_ie = 0   
    if  denom_dee != 0:
        prop_de = count_de/denom_dee
    else:    
        prop_de = 0
    if denom_mee != 0:   
        prop_me = count_me/denom_mee
        prop_mie = count_mie/denom_mee
    else:
        prop_me = 0
    
    trans_prop.loc['I'+str(end), 'E'] = prop_ie
    trans_prop.loc['M'+str(end), 'E'] = prop_me
    trans_prop.loc['D'+str(end), 'E'] = prop_de 
    trans_prop.loc['M'+str(end), 'I'+str(end)] = prop_mie
    
    for i in range(1, end):
        count_ii = 0  
        count_im = 0
        count_id = 0
        count_md = 0
        count_mi = 0
        count_mm = 0
        count_di = 0
        count_dm = 0
        count_dd = 0
        denom_m = N
        denom_d = N
        denom_ii = 0
        denom_imd = 0
        # Xác xuất chuyển trạng thái từ M(i) đến D(i+1), I(i), và M(i+1)
        for x, y, z, t in zip(state_alignment_df['M'+str(i)], 
                              state_alignment_df['D'+str(i+1)], 
                              state_alignment_df['I'+str(i)],
                              state_alignment_df['M'+str(i+1)]):
            
            if x != "-" and z !=0 and z!= "-":
                count_mi += 1    
            elif x != "-" and (z == 0 or z == "-"):     
                if t != "-":
                    count_mm += 1
                else:
                    count_md += 1
            if x == "-":
                denom_m -= 1
            # Xác xuất chuyển trạng thái từ I và chính nó
            if z != "-" and z != 0:
                count_ii += len(z) - 1
                denom_ii += len(z) 
        # Xác xuất chuyển trạng thái từ I đến M và D
        for x, y, z in zip(state_alignment_df['I'+str(i-1)], 
                           state_alignment_df['M'+str(i)], 
                           state_alignment_df['D'+str(i)]):
            if x != 0 and x != "-" and y!= "-":
                count_im += 1
            if x != 0 and x != "-" and z == "-":
                count_id += 1 
            if x != 0 and x != "-":
                denom_imd += len(x)
        if denom_imd != 0:        
            prop_im.append(count_im/denom_imd)
            prop_id.append(count_id/denom_imd)
        else: 
            prop_im.append(0)
            prop_id.append(0)
            
        if denom_m != 0:
            prop_md.append(count_md/denom_m)
            prop_mi.append(count_mi/denom_m)
            prop_mm.append(count_mm/denom_m)
        else:
            prop_md.append(0)
            prop_mi.append(0)
            prop_mm.append(0)
            
        if denom_ii != 0:
            prop_ii.append(count_ii/denom_ii)
        else:
            prop_ii.append(0)
        # Xác xuất chuyển trạng thái từ D(i) đến I(i), M(i+1), và D(i+1); I(i-1) đến  I(i-1)    
        for x, y, z, t in zip(state_alignment_df['D'+str(i)], 
                              state_alignment_df['I'+str(i)], 
                              state_alignment_df['M'+str(i+1)],
                              state_alignment_df['D'+str(i+1)]):
            if x == "-" and y != 0 and y != "-":
                count_di += 1
            if x == "-" and z!= "-":
                count_dm += 1
            if x == "-" and t == "-":
                count_dd += 1
            if x != "-":
                denom_d -= 1

        if denom_d != 0:
            prop_di.append(count_di/denom_d) 
            prop_dm.append(count_dm/denom_d)
            prop_dd.append(count_dd/denom_d)
        else:
            prop_di.append(0)
            prop_dm.append(0)
            prop_dd.append(0)
        
        trans_prop.loc['I'+str(i), 'I'+str(i)] = prop_ii[i-1]
        trans_prop.loc['I'+str(i-1), 'D'+str(i)] = prop_id[i-1]
        trans_prop.loc['I'+str(i-1), 'M'+str(i)] = prop_im[i-1]
        trans_prop.loc['M'+str(i), 'D'+str(i+1)] = prop_md[i-1]
        trans_prop.loc['M'+str(i), 'I'+str(i)] = prop_mi[i-1]
        trans_prop.loc['M'+str(i), 'M'+str(i+1)] = prop_mm[i-1]
        trans_prop.loc['D'+str(i), 'I'+str(i)] = prop_di[i-1]
        trans_prop.loc['D'+str(i), 'D'+str(i+1)] = prop_dd[i-1]
        trans_prop.loc['D'+str(i), 'M'+str(i+1)] = prop_dm[i-1]
        
    count_ime = 0
    count_ide = 0
    count_mie = 0
    denom_ie = 0
    # Xác xuất chuyển từ I(end-1) đến M(end) và D(end)
    for x, y, z in zip( state_alignment_df['I'+str(end-1)], 
                        state_alignment_df['M'+str(end)], 
                        state_alignment_df['D'+str(end)]):
        if x != 0 and x != "-" and y!= "-":
            count_ime += 1
        if x != 0 and x != "-" and z == "-":
            count_ide += 1
        if x != "-" and x != 0:
            denom_ie += len(x)
            
    if denom_ie != 0:        
        prop_ime = count_ime/denom_ie
        prop_ide = count_ide/denom_ie
    else: 
        prop_ime = 0
        prop_ide = 0
        
    trans_prop.loc['I'+str(end-1), 'M'+str(end)] = prop_ime
    trans_prop.loc['I'+str(end-1), 'D'+str(end)] = prop_ide
    
    return(trans_prop)

In [834]:
trans_p = trans_prop(alignment, theta)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [414]:
# Ma trận chuyển trạng thái 
trans_p                         

Unnamed: 0,S,I0,M1,D1,I1,M2,D2,I2,M3,D3,...,M6,D6,I6,M7,D7,I7,M8,D8,I8,E
S,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
I0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
M1,0.0,0.0,0.0,0.0,0.0,0.8,0.2,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
D1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
I1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
M2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
D2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
I2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
M3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
D3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Ma trận xác xuất sinh các ký tự từ các trạng thái sẽ được chỉnh sửa bằng cách cho thêm các giá trị vào các xác xuất sinh từ các trạng thái delete và trạng thái kết thúc E.

In [1074]:
emit_p.loc[["D1", "D2", "D3", "D4", "D5", "D6", "D7", "D8"], :] = [0.17, 0.17, 0.17, 0.17, 0.17, 0.17]
emit_p.loc["E", :] = [0.17, 0.17, 0.17, 0.17, 0.17, 0.17]

In [1076]:
emit_p

Unnamed: 0,A,B,C,D,E,F
S,0.0,0.0,0.0,0.0,0.0,0.0
I0,0.0,0.0,0.0,0.0,0.0,0.0
M1,1.0,0.0,0.0,0.0,0.0,0.0
D1,0.17,0.17,0.17,0.17,0.17,0.17
I1,0.0,0.0,0.0,0.0,0.0,0.0
M2,0.0,0.0,0.5,0.25,0.0,0.25
D2,0.17,0.17,0.17,0.17,0.17,0.17
I2,0.0,0.0,0.0,0.0,0.0,0.0
M3,0.25,0.0,0.0,0.75,0.0,0.0
D3,0.17,0.17,0.17,0.17,0.17,0.17


### 4 Tìm trình tự của protein mới có xác xuất tương đồng lớn nhất với trình tự của họ protein ban đầu

In [1020]:
states = trans_p.columns.values
# Trình tự protein mới
obs = ["A", "E", "F", "D", "F", "D", "C"]

In [1021]:
# Các trạng thái 
states

array(['S', 'I0', 'M1', 'D1', 'I1', 'M2', 'D2', 'I2', 'M3', 'D3', 'I3',
       'M4', 'D4', 'I4', 'M5', 'D5', 'I5', 'M6', 'D6', 'I6', 'M7', 'D7',
       'I7', 'M8', 'D8', 'I8', 'E'], dtype=object)

In [None]:
# Một trạng thái khả dĩ của protein sau khi được căn dòng trình tự

In [1058]:
obs = ['A', 'A', 'A', 'E', 'F', 'D', 'F', 'D', 'C', 'C']

Dùng thuật toán Viterbi để tìm một trình tự của protein mới có xác xuất tương đồng lớn nhất với trình tự của họ protein ban đầu.

In [1062]:
"""Thuật toán Viterbi tìm đường đi dài nhất trong đồ thị Viterbi"""
def viterbi(obs, states, trans_p, emit_p):
    V = [{}]
    path = {}
 
    # Trường hợp ban đầu (t == 0)
    for y in states:
        V[0][y] = trans_p.loc['S', y] * emit_p.loc[y, obs[0]]
        path[y] = [y]
    
    # Viterbi cho t > 0
    for t in range(1, D):
        V.append({})
        newpath = {}
 
        for y in states:
            (prob, state) = max((V[t-1][y0] * trans_p.loc[y0, y] * emit_p.loc[y, obs[t]], y0) for y0 in states)
            V[t][y] = prob
            newpath[y] = path[state] + [y]
           
        # Không nhớ đường đi cũ 
        path = newpath
       
    n = 0           # Nếu chỉ có một quan sát
    if len(obs) != 1:
        n = t
    print_dptable(V)
    (prob, state) = max((V[n][y], y) for y in states)
    return (prob, path[state])
    
# In ra bảng giá trị xác xuất P(x, pi)
def print_dptable(V):
    s = "    " + " ".join(("%7d" % i) for i in range(len(V))) + "\n"
    for y in V[0]:
        s += "%.5s: " % y
        s += " ".join("%.10s" % ("%f" % v[y]) for v in V)
        s += "\n"
    print(s)

In [1063]:
example = viterbi(obs, states, trans_p, emit_p)

          0       1       2       3       4       5       6       7       8       9
S: 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
I0: 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
M1: 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
D1: 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
I1: 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
M2: 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
D2: 0.000000 0.034000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
I2: 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
M3: 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
D3: 0.000000 0.000000 0.005780 0.000000 0.000000 0.000000 0.000000 0.00

In [1082]:
# Các trạng thái mà HHM đi qua
example

(2.716692480000001e-06,
 ['M1', 'D2', 'D3', 'M4', 'M5', 'I5', 'M6', 'M7', 'M8', 'E'])

Các ký tự được sinh ra từ các trạng thái mà HMM đi qua.

<a><img src="https://i.imgur.com/OkCQDOi.jpg" width= "800" align= "center"></a>

### 5 Kết Luận

Phương pháp HMMs và Viterbi giúp chúng ta giải quyết được bài toán căn dòng đa trình tự, từ đó xác định được mức độ tương đồng giữa các protein. Từ mức độ tương đồng đó người ta có thể tiến hành phân loại các protein hoặc tìm kiếm các protein trên động vật có sự tương đồng lớn với các protein người để thực hiện các thí nghiệm trên động vật khi chưa thể thực hiện các thí nghiệm đó trên người.

Phần coding cho việc tìm các đối tượng của HMM đã hoàn thiện, có thể sử dụng để lập ma trận chuyển trạng thái và ma trận sinh các ký tự từ các trạng thái của nhiều bộ dữ liệu đã được căn dòng trình tự. Tuy nhiên, phần coding của thuật toán Viterbi chưa hoàn thiện ở phần căn dòng trình tự, do đó chưa thể thực hiện một cách tự động cho mọi bộ dữ liệu. 