## HMM beyond Viterbi learning:

https://www.youtube.com/watch?v=yUZ8CBdeJRs&list=PLQ-85lQlPqFPnk31Uut2ajVkBvlFmMtdx&index=9

Also, because Viterbi learning is dependent on the initial guess for Parameters, it may become stuck in a local optimum. Like other heuristics, it is often run many times, retaining the best choice of Parameters.

**Exercise Break**: Apply Viterbi learning to learn parameters for an HMM modeling CG-islands as well as for the profile HMM for the gp120 HIV alignment.

---

<br>

#### Recall the outcome likelihood problem: 

Given arbitrary HMM and emission string $x$, we calculate the sum of the probabilities of the hidden paths entering a node. We progressively fill in the viterbi matrix from left to right (hence *forward algorithm*), using a similar reccurence to the Viterbi algorithm, except in the Forward algo we take the sum of all the incoming edges instead of the max.



<br>

---

<br>

### Soft Decoding Problem:




The Soft Decoding Problem
In a previous chapter, we introduced a “soft” clustering algorithm, based on the more general expectation maximization algorithm, that relaxed the Lloyd algorithm’s rigid assignment of points to clusters. Analogously, by generating a single optimal hidden path, the Viterbi algorithm provides a rigid “yes” or “no” answer to the question of whether an HMM was in state k at time i. But how certain are we that this was the case?

Returning to the crooked casino analogy once more, say that the i-th coin flip is heads. If this flip occurs in the middle of ten consecutive heads, then you should be relatively confident that the biased coin was used. But what if, of the ten flips surrounding the i-th flip, six are heads and four are tails? In this case, you should be less certain that the biased coin was used.

In the case of an **arbitrary HMM, we would like to compute the conditional probability Pr(πi = k|x) that the HMM was in state k at time i given that it emitted string x**.



Find the **probability** that an HMM was in a **particular state at a particular moment** given its emitted string.

**Input**:
   -  A string x = x1 ... xn emitted by an HMM.  
   
**Output**:
   - The **conditional probability** Pr(πi = k|x) that the **HMM was in state k at step i** given that it emitted x.

<br>

### *another learning problem, this time with soft decisions.*

*in the first Viterbi learning algorithm, we assigned a state for each position in the hidden path; instead we should assign a weight to each state for each position in the path. This allows for more probabilistic decisions instead of a hard assignment scenario.*

*The idea is analogous to Lloyd algorithm vs soft k-means clustering from gene expression chapter, except asking for a specific outcome's probability*

---

<br>

The **unconditional probability that a hidden path will pass through state $k$ at time $i$ and emit $x$** can be written as the **sum**:

<br>

\begin{aligned} \mathrm{Pr}(\pi_i=k, x) = \displaystyle\sum_{\text{all paths }\pi\text{ with }\pi_i =k}\mathrm{Pr}(x, \pi) \end{aligned}


<br> 

##### The **conditional probability $Pr(π_i = k|x)$** is equal to the **proportion of paths that pass through** state k at time i and emit x with respect to all paths emitting x:

\begin{aligned} \mathrm{Pr}(\pi_i =k|x) & = \dfrac{\mathrm{Pr}(\pi_i=k, x)}{\mathrm{Pr}(x)}\\ & = \dfrac{\sum_{\text{all paths }\pi\text{ with }\pi_i =k}\mathrm{Pr}(x, \pi)}{\sum_{\text{all paths } \pi} \mathrm{Pr}(x, \pi)}\,. \end{aligned} 

<br>

**STOP and Think**: If the Viterbi algorithm for the crooked casino emits a path $π = π_1, π_2, ...π_n$ with $π_i = B$, is the dealer more likely to have used a biased coin at step $i$? 

*not necessarily, Viterbi algorithm finds the optimal path through the entire graph; it could be that another state was more likely at this particular time, but the paths passing through that state were not the optimal path*

Is it possible that $π_i = B$ but that $Pr(π_i = B|x)$ is smaller than $Pr(π_i = F|x)$? (*yes, perhaps less likely*)

<br>

---

#### The forward-backward algorithm



We note that $Pr(π_i = k, x)$ is equal to the sum of product weights Pr(π, x) of all paths π through the Viterbi graph for x that pass through the node $(k, i)$. As shown in the figure below, we can break each such path into a blue subpath from source to (k, i), which we denote $π$, and a subpath from $(k, i)$ to $sink$, which we denote $π$. Writing $Weight(π_{blue})$ and $Weight(π_{red})$ as the respective product weights of these subpaths yields the recurrence:


\begin{aligned} \mathrm{Pr}(\pi_i = k, x) & = \sum_{\text{all paths }\pi \text{ with }\pi_i =k} \mathrm{Pr}(x, \pi)\\ & = \sum_{\color{blue}{\text{all paths }\pi_\text{blue}}} \sum_{\color{red}{\text{all paths }\pi_\text{red}}} 
\textit{Weight}(\pi_\text{blue}) \cdot \textit{Weight}(\pi_\text{red})\\ & = \sum_{\color{blue}{\text{all paths }\pi_\text{blue}}} \textit{Weight}(\pi_\text{blue}) \cdot \sum_{\color{red}{\text{all paths }\pi_\text{red}}} \textit{Weight}(\pi_\text{red}) \end{aligned}

<br>

#### The Forward algorithm solves the Outcome Likelihood Problem & is similar to the recurrence in the Viterbi algorithm
We have already computed the **sum of product weights of all blue subpaths**; it is just $forward(k)$, i , which we encountered when solving the **Outcome Likelihood Problem**. Now we would like to compute the sum of product weights of all red subpaths, which we denote as backwardk, i , so that the preceding equation becomes


\begin{aligned} \mathrm{Pr}(\pi_i=k, x) = {\color{blue}{\textit{forward}_{k,i}}} \cdot {\color{red}{\textit{backward}_{k,i}}} \end{aligned}

#### Backward algorithm
The name of $backward(k, i)$ derives from the fact that to compute this value, we can **simply reverse the directions of all edges in the Viterbi graph** (see figure below) and apply the same dynamic programming algorithm used to compute forwardk, i . Since the reversed edge connecting (l, i + 1) to (k, i) has weight $Weight_i(k, l) = transition(k, l) \cdot emission_l(x_{i+1})$, we have that

<br>

\begin{align} {\color{red}{\textit{backward}_{k,i}}} = \displaystyle\sum_{\text{all states }l} {\color{red}{\textit{backward}_{l,i+1}}} \cdot \textit{Weight}_i(k,l) \end{align}

<br>
STOP and Think: How should this recurrence be initialized?



#### *Initializing the Backward algorithm* (I think...)

When we entered the first column of |states| for the first position in $emissions$ in the Forward algorithm, we simply gave the transition probabilities as being equal for each state: $T(source \rightarrow \pi_0) = \dfrac{1}{|states|}$

<br> 

For the Backward algorithm need to consider the edges from nodes in the last column of the Viterbi graph $(state, n)$ that always lead to the sink.

So instead of Transition probability = 1/|states| it should be just 1 since there is only one edge from each final state to the sink, so the transition prob is 1.






In [74]:

def forward(X, T, E, S): 
    """returns likelihood of emission string, given HMM"""
    # F: |states|*|time of emission| matrix
    # first column of F with P(state,emission) * (1/States; naive probability initials)
    F = np.zeros(shape = (S, len(X))) 
    for state in range(S):
        F[state][0] = E[state][X[0]] / S
    # Fill forward matrix: p(node) =  p(emission) * sum(p(transition) * p(previous node))
    for i in range(1,len(X)):
        for state in range(S):
            F[state][i] = sum(T[k][state]*F[k][i-1] for k in range(S))
            F[state][i] *= E[state][X[i]]
    return F

# backwards outcome likelihood
def backward(X, T, E, S): 
    B = np.zeros(shape = (S, len(X)))
    # last col: p(trans(sink,last state)) is p=1 instead of 1/S as in forward algo??
    for state in range(S):
        B[state][len(X)-1] =  1
    for i in range(len(X)-2, -1, -1):
        for state in range(S):
            B[state][i] = sum(T[state][k] * B[k][i+1] * E[k][X[i+1]] for k in range(S))
    return B

def forward_backward(X,T,E,S):
    """ return responsibility matrix (smoothing)"""
    F = forward(X, T, E, S)
    B = backward(X, T, E, S)
    F_sink = sum(F[state][len(X)-1] for state in range(S))
    return np.multiply(F, B)/F_sink


In [92]:
import numpy as np
import pandas as pd
%cd /Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/data
with open("ba10j.sample.txt",'r') as file:
    inputs = file.readlines()
    alpha = inputs[2].strip().split()
    X = list(alpha.index(x) for x in inputs[0].strip())
    states = inputs[4].strip().split()
    S = len(states)
    # HMM Transition, emission matrices,
    T = np.array([line.split()[1:] for line in inputs[7: 7+S]], float)
    E = np.array([line.split()[1:] for line in inputs[7+S+2:]], float)
    del(inputs)

##
R = np.transpose(forward_backward(X,T,E,S))
R = np.around(R, 4)
R = pd.DataFrame(R, columns=states)
%cd /Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/results

R.to_csv("ba10j.sample.out.txt", index=False)
R

/Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/data
/Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/results


Unnamed: 0,A,B
0,0.5438,0.4562
1,0.6492,0.3508
2,0.9647,0.0353
3,0.9936,0.0064
4,0.9957,0.0043
5,0.9891,0.0109
6,0.9154,0.0846
7,0.964,0.036
8,0.8737,0.1263
9,0.8167,0.1833


In [95]:
import numpy as np
import pandas as pd
%cd /Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/data
with open("dataset_26261_5.txt",'r') as file:
    inputs = file.readlines()
    alpha = inputs[2].strip().split()
    X = list(alpha.index(x) for x in inputs[0].strip())
    states = inputs[4].strip().split()
    S = len(states)
    # HMM Transition, emission matrices,
    T = np.array([line.split()[1:] for line in inputs[7: 7+S]], float)
    E = np.array([line.split()[1:] for line in inputs[7+S+2:]], float)
    del(inputs)

##


R = np.transpose(forward_backward(X,T,E,S))
R = np.around(R, 4)

%cd /Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/results
R = pd.DataFrame(R, columns=states)
R.to_csv("ba10j.problem.out.txt", index=False, sep='\t')
R

/Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/data
/Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/results


Unnamed: 0,A,B
0,0.6283,0.3717
1,0.8035,0.1965
2,0.8047,0.1953
3,0.616,0.384
4,0.6458,0.3542
5,0.6136,0.3864
6,0.8228,0.1772
7,0.591,0.409
8,0.8075,0.1925
9,0.7976,0.2024


In [96]:
# rosalind data rosalind_ba10j

import numpy as np
import pandas as pd
%cd /Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/data
with open("rosalind_ba10j.txt",'r') as file:
    inputs = file.readlines()
    alpha = inputs[2].strip().split()
    X = list(alpha.index(x) for x in inputs[0].strip())
    states = inputs[4].strip().split()
    S = len(states)
    # HMM Transition, emission matrices,
    T = np.array([line.split()[1:] for line in inputs[7: 7+S]], float)
    E = np.array([line.split()[1:] for line in inputs[7+S+2:]], float)
    del(inputs)

##


R = np.transpose(forward_backward(X,T,E,S))
R = np.around(R, 4)

%cd /Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/results
R = pd.DataFrame(R, columns=states)
R.to_csv("ba10j.rosalind.out.txt", index=False, sep='\t')
R

/Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/data
/Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/results


Unnamed: 0,A,B,C
0,0.4851,0.1737,0.3412
1,0.5028,0.1895,0.3077
2,0.3179,0.4505,0.2316
3,0.3489,0.4598,0.1913
4,0.3605,0.358,0.2815
5,0.4952,0.1946,0.3102
6,0.3631,0.4414,0.1955
7,0.5272,0.1792,0.2936
8,0.3471,0.3502,0.3027
9,0.5379,0.1915,0.2706


In [100]:
# Finished Forward-Backward algorithm for soft-decoding problem
# matrix answers what was likelihood of state,i given Xi
# now hidden path is probabilistic in nature instead of hard assignment

def forward_backward(X,T,E,S):
    
    
    def forward(X, T, E, S): 
        """returns likelihood of emission string, given HMM"""
        # F: |states|*|time of emission| matrix
        # first column of F with P(state,emission) * (1/States; naive probability initials)
        F = np.zeros(shape = (S, len(X))) 
        for state in range(S):
            F[state][0] = E[state][X[0]] / S
        # Fill forward matrix: p(node) =  p(emission) * sum(p(transition) * p(previous node))
        for i in range(1,len(X)):
            for state in range(S):
                F[state][i] = sum(T[k][state]*F[k][i-1] for k in range(S))
                F[state][i] *= E[state][X[i]]
        return F

    # backwards outcome likelihood
    def backward(X, T, E, S): 
        B = np.zeros(shape = (S, len(X)))
        # last col: p(trans(sink,last state)) is p=1 instead of 1/S as in forward algo??
        for state in range(S):
            B[state][len(X)-1] =  1
        for i in range(len(X)-2, -1, -1):
            for state in range(S):
                B[state][i] = sum(T[state][k] * B[k][i+1] * E[k][X[i+1]] for k in range(S))
        return B
    
    
    """ return responsibility matrix (smoothing)"""
    F = forward(X, T, E, S)
    B = backward(X, T, E, S)
    F_sink = sum(F[state][len(X)-1] for state in range(S))
    return np.multiply(F, B)/F_sink


---

In [101]:
# rosalind data rosalind_ba10j

import numpy as np
import pandas as pd
%cd /Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/data
with open("rosalind_ba10j.txt",'r') as file:
    inputs = file.readlines()
    alpha = inputs[2].strip().split()
    X = list(alpha.index(x) for x in inputs[0].strip())
    states = inputs[4].strip().split()
    S = len(states)
    # HMM Transition, emission matrices,
    T = np.array([line.split()[1:] for line in inputs[7: 7+S]], float)
    E = np.array([line.split()[1:] for line in inputs[7+S+2:]], float)
    del(inputs)

##


R = np.transpose(forward_backward(X,T,E,S))
R = np.around(R, 4)

%cd /Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/results
R = pd.DataFrame(R, columns=states)
R.to_csv("ba10j.rosalind.out.txt", index=False, sep='\t')
R

/Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/data
/Users/jasonmoggridge/Dropbox/Rosalind/Coursera_textbook_track/Course6/results


Unnamed: 0,A,B,C
0,0.4851,0.1737,0.3412
1,0.5028,0.1895,0.3077
2,0.3179,0.4505,0.2316
3,0.3489,0.4598,0.1913
4,0.3605,0.358,0.2815
5,0.4952,0.1946,0.3102
6,0.3631,0.4414,0.1955
7,0.5272,0.1792,0.2936
8,0.3471,0.3502,0.3027
9,0.5379,0.1915,0.2706
