---
title: "Chapter BA10A: Probability of a Hidden Path"
format:
  html:
    code-fold: false
    toc: true
jupyter: python3
---

## Problem Statement and Biological Context

**Given:** A hidden path π followed by the states States and transition matrix Transition of an HMM (Σ, States, Transition, Emission).

**Return:** The probability of this path, $Pr(\pi)$. You may assume that initial probabilities are equal.


**Sample Dataset:**
```
AABBBAABABAAAABBBBAABBABABBBAABBAAAABABAABBABABBAB
--------
A   B
--------
    A   B
A   0.194   0.806
B   0.273   0.727
```

**Sample Output:**
```
5.01732865318e-19
```

This problem introduces us to Hidden Markov Models (HMMs), one of the most fundamental statistical frameworks in bioinformatics[1][2]. HMMs are probabilistic models that describe sequences where the underlying process generating the observations is hidden from direct view[2][8]. In biological contexts, HMMs are extensively used for gene finding, protein structure prediction, sequence alignment, and identifying functional domains in proteins[8][29].

The hidden path probability calculation forms the foundation for more complex HMM algorithms like the Viterbi algorithm for decoding and the Forward-Backward algorithm for parameter estimation[5][6]. Understanding this computation is essential because it represents the core mathematical operation that quantifies how likely a particular sequence of hidden states is given the model parameters.

In genomics, this could model scenarios such as CG-island detection (where hidden states represent "inside" vs "outside" CG-rich regions), gene structure annotation (exons vs introns), or chromatin state segmentation (active vs inactive transcription regions)[1][42]. The transition probabilities capture the biological tendency for these states to persist or change over genomic coordinates.

## Mathematical Foundation

### Hidden Markov Model Definition

A Hidden Markov Model is formally defined as a tuple $$ M = (Q, \Sigma, A, E, \pi) $$ where[2][8]:

- $( Q = \{q_1, q_2, \ldots, q_N\} )$ is the set of hidden states
- $( \Sigma )$ is the observation alphabet  
- $( A )$ is the $( N \times N )$ state transition matrix
- $( E )$ is the $( N \times |\Sigma| )$ emission matrix
- $( \pi )$ is the initial state probability distribution

For this problem, we focus on the transition matrix $( A )$ where $$ A_{ij} = P(q_{t+1} = j \mid q_t = i) $$ represents the probability of transitioning from state $( i )$ to state $( j )$.

### Hidden Path Probability Calculation

Given a hidden path $( \pi = \pi_1\pi_2\ldots\pi_L )$ of length $( L )$, the probability of this path is computed using the Markov property[2][5]:

$$P(\pi) = P(\pi_1) \prod_{t=2}^{L} P(\pi_t | \pi_{t-1})$$

Under the equal initial probability assumption, $P(\pi_1) = \frac{1}{|Q|}$, so:

$$P(\pi) = \frac{1}{|Q|} \prod_{t=2}^{L} A_{\pi_{t-1}, \pi_t}$$

### Numerical Stability Considerations

For long sequences, the probability product can become extremely small, leading to numerical underflow[40][43]. The standard approach is to work in log space:

$$\log P(\pi) = \log\left(\frac{1}{|Q|}\right) + \sum_{t=2}^{L} \log A_{\pi_{t-1},\pi_t}$$

This transforms the product into a sum, making computations numerically stable and more efficient[43][45].

### Complexity Analysis

The time complexity is $( O(L) )$ where $( L )$ is the path length, since we perform one transition probability lookup per position. Space complexity is $( O(1) )$ for the computation itself, plus $( O(N^2) )$ to store the transition matrix where $( N )$ is the number of states.

## Reference Implementation

In [1]:
import numpy as np
import numba
from typing import List, Dict, Tuple
from numba import njit