User's Guide
Hidden Markov models allow for the state transitions of a model to be deduced based solely on on a series of observations. They are useful in, for example, studying genomes: a given nucleotide (A, C, G, or T) may be part of the various components of a gene, or may be part of the region between genes. One can use an HMM to annotate a genome, portrayed as a series of observations (a string of nucleotides), by calculating which nucleotides are most likely to be present in a gene, and which are most likely to be between genes.
The hmm package provides two classes used to construct HMMs:
- state: represents a single state in an HMM
- hmm: represents an HMM in its entirety; it is constructed with a list of state objects
The use of this module is demonstrated in this blog post. To summarize that post, this model:
can be represented with the following code:
import hmm
s1 = hmm.state(
'S1', # name of the state
0.5, # probability of being the initial state
{ '1': 0.5, # probability of emitting a '1' at each visit
'2': 0.5 }, # probability of emitting a '2' at each visit
{ 'S1': 0.9, # probability of transitioning to itself
'S2': 0.1 }) # probability of transitioning to state 'S2'
s2 = hmm.state('S2', 0.5,
{ '1': 0.25, '2': 0.75 },
{ 'S1': 0.8, 'S2': 0.2 })
model = hmm.hmm(['1', '2'], # all symbols that can be emitted
[s1, s2]) # all of the states in this HMM
Once the model is created (in the object called model
, in this case), the sequence of states that most likely generated an arbitrary sequence of symbols can be calculated with:
path, prob = model.viterbi_path('222') # can also use ['2', '2', '2']
print path
print prob
which, in this case, would display
['S2', 'S1', 'S1']
-1.17069622717
Note that the probabilities are always provided as log (base 10) transformed numbers, since they can be very small numbers.