## Install Libraries

In [None]:
!pip install hidden_markov

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import numpy as np
from hidden_markov import hmm

Group Members:
- Sezin Tekin
- Bahar Sevgin
- Erdem Kertmen
- İrem Yeşilbaş
- Eyşan Mutlu

In the human genome cytosine (C) is typically methylated. There is a relatively high change of mutation
of this methyl-C into a thymine(T). As a result, in general CpG dinucleotides are rare in the genome than
would be expected from the independent probabilities of C and G. [1]

<img src="https://i.ibb.co/9w4vG7t/Screen-Shot-2023-04-15-at-18-12-01.png" alt="Screen-Shot-2023-04-15-at-18-12-01" width="800" border="0">

# Task 1: Creating HMM Model

- Define states and observations
- Define the probabilities
- Create the model

In [None]:
# Define Possible Hidden States
states = ["non-island", "CpG islands"]

# Define Possible Observations
possible_observation = ["A", "C", "T", "G"]

# Define Start Probabilities as numpy matrix (ith element corresponds to ith element in states)
start_probability = np.matrix([0.5, 0.5])

# Define Transition Probabilities as numpy matrix
# Dimension -> [length of states, lenght of states]
# transition_probability[i,j] means probability of states[i] -> states[j]
transition_probability = np.matrix([[0.95, 0.05], [0.1, 0.90]])

# Define Emission Probabilities as numpy matrix
# Dimension -> [length of states, lenght of observations]
# transition_probability[i,j] means probability of states[i] -> observations[j]
emission_probability = np.matrix([[0.27, 0.24, 0.26, 0.23],
                                 [0.15, 0.33, 0.16, 0.36]])

# Initialize hmm model
hmm_model = hmm(states = states,
                observations = possible_observation,
                start_prob = start_probability,
                trans_prob = transition_probability,
                em_prob = emission_probability )

# Task 2: Forward Algorithm

By using Forward algorithm, calculate the probability of all possible paths which give rise to the
sequence **CGCG**.

In [None]:
# Use forward algorithm from hidden_markov library

observations = ('C', 'G','C','G')
print(hmm_model.forward_algo(observations))



0.007838160412500001


# Task 3: Viterbi Algorithm

Using the HMM model shown in Figure 2, predict the most probable path by Viterbi algorithm which
give rise to the seqeunce CGCG.

In [None]:
# Use viterbi algorithm from hidden_markov library

observations = ('C', 'G','C','G')
print(f"The most probable path for the sequence {sequence} is {hmm_model.viterbi(sequence)}")

The most probable path for the sequence CGCG is ['CpG islands', 'CpG islands', 'CpG islands', 'CpG islands']


# Task 4: Find K Best Possible Sequences

- Complete find_k_obs function with following information below.
- **Using find_k_obs** function with k = 5, print:
  - Sequences
  - Likelihoods
  - Most probable hidden states by Viterbi algorithm for specific sequence

**NOTE:** Take the length of the possible observation sequences as 4.

In [None]:
def find_k_obs(hmm_model, possible_observation, k_seq = 5):
  """
  The function returns the k most likely sequences and their likelihoods

  Arguments:
  hmm_model: hmm model object to perform forward and viterbi algorithm
  possible observations: Unique set of possible observations
  k_seq: The number of most likely observation sequences to return

  Return:
    best_seq (list): Includes k most likely sequences
    best_likelihood (list): Includes likelihoods of k most likely sequences

  Example:

  possible_observations = ['X','Y']
  best_obs_seq, best_likelihood = find_k_obs(hmm_model, possible_observation, k_seq = 2)
  best_obs_seq: [['X','X'], ['Y','X']]
  best_likelihood: [0.09, 0.08]


  """


  return best_seq, best_likelihood

In [None]:
# Print Sequences, Likelihoods, Most Probable Paths
...

# References

[1] Richard Durbin, Sean R Eddy, Anders Krogh, and Graeme Mitchison. Biological sequence analysis:
probabilistic models of proteins and nucleic acids. Cambridge university press, 1998.