# HMM with TensorFlow

In [30]:
import tensorflow as tf
import numpy as np
import sys

## Weather HMM example
Das Wetter bei deinem Übersee-Chatfreund lässt sich durch eine Markowkette $X_1,X_2,\ldots$ mit
Zustandsraum $Q=\{\texttt{sun},\texttt{rain},\texttt{storm}\}$ und Übergangsmatrix
$$A=(A[r,s])_{\scriptsize r,s \in Q} = \begin{pmatrix}
0.7 & 0.2 & 0.1\\
0.3 & 0.5 & 0.2\\
0.2 & 0.6 & 0.2\\
\end{pmatrix}
$$
beschreiben (Reihen und Spalten in der Reihenfolge `sun`, `rain`, `storm`).

Dabei sei $X_i$ das Wetter am $i$-ten Tag und $X_1=\texttt{sun}$. Dein Freund verfolgt vom Wetter abhängig Aktivitäten entweder drinnen 
(`in`) oder draußen (`out`). Sei $\Sigma:=\{\texttt{in},\texttt{out}\}$. Folgende Matrix
beschreibe die vom Wetter abhängenden Wahrscheinlichkeiten (Wkeiten) der Aktivitäten
$$B=\big(B[q,s]\big)_{\scriptsize\begin{array}{l}q\in Q\\s\in \Sigma\end{array}} = \begin{pmatrix}
0.4 & 0.6 \\
0.8 & 0.2 \\
0.9 & 0.1 \\
\end{pmatrix}
.$$ 
Ablesebeispiel: Dein Freund bleibt mit Wkeit 0.9 drinnen, wenn es an dem Tag stürmt (Spalten in der Reihenfolge `in`, `out`).
Beantworte folgende Fragen für das durch $Q,\Sigma,A,B$ und $X_1$ gegebene Hidden-Markow-Modell.
Was ist die Wkeit 
$$P(Y_1=Y_2=Y_3=\texttt{in}),$$

dass dein Freund am allen drei Tagen drinnen bleibt?

![forward DP table](forwardManually.png)

**Solution: P(Y=y) = 0.1308**

## Specify the Model

### Example Model Parameters

In [31]:
n = 3 # number of states
s = 2 # emission alphabet size

In [32]:
A_init = np.array([[7, 2, 1], [3, 5, 2], [2, 6, 2]]) / 10.0
B_init = np.array([[4, 6], [8, 2], [9, 1]]) / 10.0
X1_dist = np.array([1., 0., 0.]) # starts with sun
n, s = B_init.shape # number of states, emission alphabet size
y = np.array([0, 0, 0]) # in, in, in

In [33]:
print("transitions:\n", A_init, "\nemissions:\n", B_init)

transitions:
 [[0.7 0.2 0.1]
 [0.3 0.5 0.2]
 [0.2 0.6 0.2]] 
emissions:
 [[0.4 0.6]
 [0.8 0.2]
 [0.9 0.1]]


#### Forward recursion
$$\alpha[i,q] = B[q, y[i]] \sum_{q'} \alpha[i-1, q'] \cdot A[q',q] $$

In [34]:
# tf variants of the transition and emission matrix
A = tf.Variable(A_init, trainable = True)
B = tf.Variable(B_init, trainable = True)

### Forward Variables and Algorithm
$$ \alpha(q,i) = \sum_{x_1,\ldots, x_{i-1}\in Q} P(x_1,\ldots, x_{i-1}, X_i=q, y_1,\ldots, y_i)$$
Initialization: 
$$ \alpha(q, 1) = \sum_{q\in Q} P(X_1 = q)\cdot B[q,y[0]]$$

In [35]:
def forward(y # observation sequence
           ):
    """ Forward Algorithm for Computing Sequence Likelihood """
    ell = y.shape[0]
    α = tf.Variable(np.zeros([ell, n]), trainable = False)
    
    # initialization
    α[0].assign(tf.multiply(B[:, y[0]], X1_dist))
    
    # forward algorithm
    for i in range(1, ell):
        # compute i-th row of DP table
        R = tf.linalg.matvec(A, α[i-1], transpose_a = True)
        α[i].assign(tf.multiply(B[:, y[i]], R))
    return α

def emiProb(α):
    return np.sum(α[-1,:])

In [36]:
α = forward(y)
α.numpy()

array([[0.4    , 0.     , 0.     ],
       [0.112  , 0.064  , 0.036  ],
       [0.04192, 0.0608 , 0.02808]])

In [37]:
Py = emiProb(α)
Py

0.1308

## A HMM as a Special Case of a Recurrent Neural Network
We use the notation of RNNs similar to that in [Dive into Deep Learning](https://d2l.ai/chapter_recurrent-neural-networks/bptt.html). $h_t$ is a size $n$ vector of RNN-"hidden states" (these are real numbers, not to be confused with the hidden states of HMMs, which are from $Q$).  
$$ h_t = f(x_t, h_{t-1}; A, B)$$
We chose the outputs
$$ o_t = \text{sum}(h_t) = h_t[0] + \cdots + h_t[n-1] \in [0,1]$$
so that the final output $o_T$ is just the likelihood of the sequence $P(Y)$.
This RNN does not need to produce intermediate outputs $o_t$ for $t<T$ as they are not used yet. However, they could be used in conjunction with a backwards pass.

### SimpleRNNCell
As a template we use the code for [tf.keras.layers.SimpleRNNCell](https://github.com/tensorflow/tensorflow/blob/v2.4.1/tensorflow/python/keras/layers/recurrent.py#L1222-L1420)

In [41]:
from tensorflow.python.keras.engine.base_layer import Layer
from tensorflow.python.framework import ops

class HMMCell(Layer):
  """Cell class for a HMM as a RNN.
  This class processes one step within the whole time sequence input.
  Arguments:
    n: positive integer number of hidden states, dimensionality of the output space.
  Call arguments:
    inputs: A 2D tensor, with shape of `[batch, feature]`. feature=s is emission alphabet size
    states: A 2D tensor with shape of `[batch, n]`, which is the forward variable alpha from
      the previous time step. For timestep 0, the initial state provided by user
      will be feed to cell.
  Examples:
  ```python
  inputs = np.random.random([32, 3, 2]).astype(np.float32)
  hmmC = HMMCell(3)
  output = hmmC(inputs)  # The output has shape `[32, 3]`.
  hmm = tf.keras.layers.RNN(
      HMMCell(3),
      return_sequences = True,
      return_state = True)
  # whole_sequence_output has shape `[32, 3, 3]`.
  # final_state has shape `[32, 3]`.
  whole_sequence_output, final_state = hmm(inputs)
  ```
  """

  def __init__(self,
               n, # number of HMM hidden states, output size
               **kwargs):
    super().__init__(**kwargs)
    self.n = n
    self.state_size = self.n
    self.output_size = self.n

  def build(self, input_shape):
    self.emission_kernel = self.add_weight(
        shape=(self.n, input_shape[-1]),
        name='emission_kernel') # closely related to B
    self.transition_kernel = self.add_weight(
        shape=(self.n, self.n),
        name='transition_kernel') # closely related to A
    self.built = True

  def call(self, inputs, states, training=None):
    prev_output = states
    # convert parameter matrices to stochastic matrices for transition (A) and emission probs (B)
    # TODO: this could be more efficient, maybe using tensorflow.python.keras.constraints?
    A = tf.nn.softmax(self.transition_kernel, axis=-1, name="A")
    B = tf.nn.softmax(self.emission_kernel, axis=-1, name="B")
    print ("A=\n", A)
    print ("prev_output=\n", prev_output)
    R = tf.linalg.matvec(A, prev_output, transpose_a = True)
    E = tf.linalg.matvec(B, inputs, transpose_a=False, name="E")
    output = tf.multiply(E, R)
    
    new_state = output
    return output, new_state

  def get_initial_state(self, inputs=None, batch_size=None, dtype=None):
    return _generate_zero_filled_state_for_cell(self, inputs, batch_size, dtype)

  def get_config(self):
    config = {
        'n': self.units
    }
    config.update(_config_for_enable_caching_device(self))
    base_config = super().get_config()
    return dict(list(base_config.items()) + list(config.items()))

In [45]:
batch_size = 4
print (f"n={n}, s={s}, batch_size={batch_size}")
inputs = np.random.random([batch_size, n, s]).astype(np.float32)
yi = np.random.random([batch_size, s]).astype(np.float32)
states = np.random.random([batch_size, n]).astype(np.float32)
hmmC = HMMCell(n)

output = hmmC(yi, states)
print("output:\n", output[0])

n=3, s=2, batch_size=4
A=
 tf.Tensor(
[[0.09264284 0.41968825 0.48766893]
 [0.3085003  0.08489413 0.6066056 ]
 [0.48667473 0.28925818 0.22406708]], shape=(3, 3), dtype=float32)
prev_output=
 [[0.98690563 0.82522434 0.37491477]
 [0.32643446 0.64445996 0.10785879]
 [0.35583916 0.22308438 0.4507376 ]
 [0.16685499 0.19147485 0.9160768 ]]
output:
 tf.Tensor(
[[0.3833822  0.42148274 0.6673132 ]
 [0.25967288 0.20550083 0.52807736]
 [0.25829396 0.23246434 0.25593612]
 [0.40555367 0.27698553 0.33952895]], shape=(4, 3), dtype=float32)


In [40]:
class SimpleRNNCell(DropoutRNNCellMixin, Layer):
  """Cell class for SimpleRNN.
  See [the Keras RNN API guide](https://www.tensorflow.org/guide/keras/rnn)
  for details about the usage of RNN API.
  This class processes one step within the whole time sequence input, whereas
  `tf.keras.layer.SimpleRNN` processes the whole sequence.
  Arguments:
    units: Positive integer, dimensionality of the output space.
    activation: Activation function to use.
      Default: hyperbolic tangent (`tanh`).
      If you pass `None`, no activation is applied
      (ie. "linear" activation: `a(x) = x`).
    use_bias: Boolean, (default `True`), whether the layer uses a bias vector.
    kernel_initializer: Initializer for the `kernel` weights matrix,
      used for the linear transformation of the inputs. Default:
      `glorot_uniform`.
    recurrent_initializer: Initializer for the `recurrent_kernel`
      weights matrix, used for the linear transformation of the recurrent state.
      Default: `orthogonal`.
    bias_initializer: Initializer for the bias vector. Default: `zeros`.
    kernel_regularizer: Regularizer function applied to the `kernel` weights
      matrix. Default: `None`.
    recurrent_regularizer: Regularizer function applied to the
      `recurrent_kernel` weights matrix. Default: `None`.
    bias_regularizer: Regularizer function applied to the bias vector. Default:
      `None`.
    kernel_constraint: Constraint function applied to the `kernel` weights
      matrix. Default: `None`.
    recurrent_constraint: Constraint function applied to the `recurrent_kernel`
      weights matrix. Default: `None`.
    bias_constraint: Constraint function applied to the bias vector. Default:
      `None`.
    dropout: Float between 0 and 1. Fraction of the units to drop for the linear
      transformation of the inputs. Default: 0.
    recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for
      the linear transformation of the recurrent state. Default: 0.
  Call arguments:
    inputs: A 2D tensor, with shape of `[batch, feature]`.
    states: A 2D tensor with shape of `[batch, units]`, which is the state from
      the previous time step. For timestep 0, the initial state provided by user
      will be feed to cell.
    training: Python boolean indicating whether the layer should behave in
      training mode or in inference mode. Only relevant when `dropout` or
      `recurrent_dropout` is used.
  Examples:
  ```python
  inputs = np.random.random([32, 10, 8]).astype(np.float32)
  rnn = tf.keras.layers.RNN(tf.keras.layers.SimpleRNNCell(4))
  output = rnn(inputs)  # The output has shape `[32, 4]`.
  rnn = tf.keras.layers.RNN(
      tf.keras.layers.SimpleRNNCell(4),
      return_sequences=True,
      return_state=True)
  # whole_sequence_output has shape `[32, 10, 4]`.
  # final_state has shape `[32, 4]`.
  whole_sequence_output, final_state = rnn(inputs)
  ```
  """

  def __init__(self,
               units,
               activation='tanh',
               use_bias=True,
               kernel_initializer='glorot_uniform',
               recurrent_initializer='orthogonal',
               bias_initializer='zeros',
               kernel_regularizer=None,
               recurrent_regularizer=None,
               bias_regularizer=None,
               kernel_constraint=None,
               recurrent_constraint=None,
               bias_constraint=None,
               dropout=0.,
               recurrent_dropout=0.,
               **kwargs):
    # By default use cached variable under v2 mode, see b/143699808.
    if ops.executing_eagerly_outside_functions():
      self._enable_caching_device = kwargs.pop('enable_caching_device', True)
    else:
      self._enable_caching_device = kwargs.pop('enable_caching_device', False)
    super(SimpleRNNCell, self).__init__(**kwargs)
    self.units = units
    self.activation = activations.get(activation)
    self.use_bias = use_bias

    self.kernel_initializer = initializers.get(kernel_initializer)
    self.recurrent_initializer = initializers.get(recurrent_initializer)
    self.bias_initializer = initializers.get(bias_initializer)

    self.kernel_regularizer = regularizers.get(kernel_regularizer)
    self.recurrent_regularizer = regularizers.get(recurrent_regularizer)
    self.bias_regularizer = regularizers.get(bias_regularizer)

    self.kernel_constraint = constraints.get(kernel_constraint)
    self.recurrent_constraint = constraints.get(recurrent_constraint)
    self.bias_constraint = constraints.get(bias_constraint)

    self.dropout = min(1., max(0., dropout))
    self.recurrent_dropout = min(1., max(0., recurrent_dropout))
    self.state_size = self.units
    self.output_size = self.units

  @tf_utils.shape_type_conversion
  def build(self, input_shape):
    default_caching_device = _caching_device(self)
    self.kernel = self.add_weight(
        shape=(input_shape[-1], self.units),
        name='kernel',
        initializer=self.kernel_initializer,
        regularizer=self.kernel_regularizer,
        constraint=self.kernel_constraint,
        caching_device=default_caching_device)
    self.recurrent_kernel = self.add_weight(
        shape=(self.units, self.units),
        name='recurrent_kernel',
        initializer=self.recurrent_initializer,
        regularizer=self.recurrent_regularizer,
        constraint=self.recurrent_constraint,
        caching_device=default_caching_device)
    if self.use_bias:
      self.bias = self.add_weight(
          shape=(self.units,),
          name='bias',
          initializer=self.bias_initializer,
          regularizer=self.bias_regularizer,
          constraint=self.bias_constraint,
          caching_device=default_caching_device)
    else:
      self.bias = None
    self.built = True

  def call(self, inputs, states, training=None):
    prev_output = states[0] if nest.is_nested(states) else states
    dp_mask = self.get_dropout_mask_for_cell(inputs, training)
    rec_dp_mask = self.get_recurrent_dropout_mask_for_cell(
        prev_output, training)

    if dp_mask is not None:
      h = K.dot(inputs * dp_mask, self.kernel)
    else:
      h = K.dot(inputs, self.kernel)
    if self.bias is not None:
      h = K.bias_add(h, self.bias)

    if rec_dp_mask is not None:
      prev_output = prev_output * rec_dp_mask
    output = h + K.dot(prev_output, self.recurrent_kernel)
    if self.activation is not None:
      output = self.activation(output)

    new_state = [output] if nest.is_nested(states) else output
    return output, new_state

  def get_initial_state(self, inputs=None, batch_size=None, dtype=None):
    return _generate_zero_filled_state_for_cell(self, inputs, batch_size, dtype)

  def get_config(self):
    config = {
        'units':
            self.units,
        'activation':
            activations.serialize(self.activation),
        'use_bias':
            self.use_bias,
        'kernel_initializer':
            initializers.serialize(self.kernel_initializer),
        'recurrent_initializer':
            initializers.serialize(self.recurrent_initializer),
        'bias_initializer':
            initializers.serialize(self.bias_initializer),
        'kernel_regularizer':
            regularizers.serialize(self.kernel_regularizer),
        'recurrent_regularizer':
            regularizers.serialize(self.recurrent_regularizer),
        'bias_regularizer':
            regularizers.serialize(self.bias_regularizer),
        'kernel_constraint':
            constraints.serialize(self.kernel_constraint),
        'recurrent_constraint':
            constraints.serialize(self.recurrent_constraint),
        'bias_constraint':
            constraints.serialize(self.bias_constraint),
        'dropout':
            self.dropout,
        'recurrent_dropout':
            self.recurrent_dropout
    }
    config.update(_config_for_enable_caching_device(self))
    base_config = super(SimpleRNNCell, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))



NameError: name 'DropoutRNNCellMixin' is not defined