In [1]:
import numpy as np
import warnings
from munch import Munch
import itertools
from mv_Viterbi import mv_Viterbi

# Split Version of Mediation Variables

The original formulation where we split a constraint across time added some complexity and, imporntantly, made it very hard to condition on the constraint being unsatsifies $C = 0$, since in the original $C=1$ was equivalent to setting every $C_t = 1$. In this version, we sidestep this issue by not splitting up $C$. For more info, refer to the notes on Overleaf.

### The Model

Same as the other notebook, the HMM here is designed specifically to demonstrate the effects of constraints. We have a three state HMM $a,b,c$ with binary emissions $A,C$. As the names would suggest, $a$ and $c$ are very likely to emit $A$ and $C$ respecively, while $b$ emits both with equal probability. The intial distribution is also uniform over all 3 states.

### The precedence constraint

We'll be handling a precendence constraint: $a$ must happen before $c$. The mediatory variable is a 2-dimensional binary vector $r_t = (m_1^t, m_2^t)$. $m_1^t$ tracks whether state $a$ has been visited yet. $m_2^t$ then tracks if, at any time up to $t$, state $c$ has occured before $a$ was visited. 

# Inference: Unconstrained vs. Unconstrained

In the unconstrained case, given an observation sequence of $A,A,C,\cdots[]$ the MAP will simply be the corresponding hidden sequence $a,a,c,\cdots$. $b$ will never be encountered in the unconstrained MAP.

However, in the constrained case, say the we know the constraint is satisfied. Then if we encounter a few initial $C$'s, our constrain forces us to choose $b$ which is allowed and more likely to emite $C$ than $a$. For example, $C,C,A,A,\cdots$ would give rise to  $b,b,a,a,\cdots$, since we cannot have $c$ happen before $a$. Note that if the intial sequence of $C$'s is longer than 3, then we'll be incentivized to eat the cost of starting at $a$ so we can unlock $c$ for the subsequent states.

On the other hand, saya we know the constraint is NOT satisfied, equivalent to knowing that $c$ happens before $a$. Then the situation is symmetric, except now we would infer $b$'s if we encounter a small intiial sequence of $a$'s.

# Interesting Points

Already, we demonstrated that with intermediate variables, we're able to run the Viterbi algorithm on the augmented model with small additional overhead. If you look at the code and/or the equations, we needed to track the states of a binary 2D vector, expanding the number of values tracked in Viterbi by a factor of 4. However, we notably did not need to blow up the transition matrix, and the "transitions"/"emissions" of these auxillary variables were incorporates as 0-1 weights.

Also, note that now it's trivial to do constrained inference in the case where we know the constrain is NOT satisfied. This amounts to merely setting the constraint emission to 0 and runnign Vitberi, as demonstrated in an example later on.

### Create the HMM

In [2]:
hidden_states = ['a','b','c']
emit_states = ['A','C']
hmm_transition = {}
for i in hidden_states:
    for j in hidden_states:
        hmm_transition[i,j] = 1/3

emit_mat = np.array([
    [.8,.2],
    [.5,.5],
    [.2,.8]
])
hmm_emit = {}
for i in range(3):
    for j in range(2):
        hmm_emit[hidden_states[i],emit_states[j]] = emit_mat[i,j].item()
        
hmm_startprob = {}
for i in hidden_states:
    hmm_startprob[i] = 1/3

hmm = Munch(states = hidden_states, emits = emit_states, tprob = hmm_transition, eprob = hmm_emit, initprob = hmm_startprob)

### Create the Constraint

In [3]:
def update_fun(r,k , r_past):
    '''
    m1^t = tau^t_a = a OR tau^{t-1}_a #tracks if state a has happend yet 
    m2^t = [1- (1 - tau^t_a) AND c)] AND m2^{t-1} = [tau^t_a or (1 - c)] AND m2^{t-1} #tracks if the arrival time of a is before c
    k is the current state
    r is the auxillary state. a 2-tuple. r = (m1,m2)
    '''
    m1 = (k == 'a') or r_past[0]
    m2 = (m1 or (not k == 'c')) and r_past[1]

    return int(r == (m1,m2))

def init_fun(k, r):
    '''
    initial "prob" of r = (m1,m2) from k. is just indicator
    '''
    m1 = k == 'a'
    m2 = not k == 'c'

    return int(r == (m1,m2))
    
def cst_fun(r, sat):
    '''
    Constraint is a boolean emissions of the final auxillary state. In this case, is just m1^T: ie. tau_a >= tau_b for all time.
    '''
    return int(r[1] == sat) 

In [4]:
cst = Munch(aux_size = 2, update_fun = update_fun, init_fun = init_fun, cst_fun = cst_fun)

### Inference when the Constraint is Satisfied

Here, we constrain $C=1$: $a$ must happen before $c$. As predicted, when encountering an initial sequence of $C$'s, our model choose $b$ since $c$ is not allowed and $b$ has a higher chance of emitting $C$. Provided the initial number of $C$'s is at most 2, we'll see this behavior. We can increase the admissable length of $b$'s by decreasing the emission probabilities $a,A$ and $c,C$ if 

In [11]:
obs = ['C','C','A','C','A','C']

In [12]:
opt_aug, opt_state = mv_Viterbi(obs, hmm, cst, sat = True)

In [13]:
opt_state

['b', 'b', 'a', 'c', 'a', 'c']

In [8]:
obs = ['A','A','C','A','C','A','C']

In [9]:
opt_aug, opt_state = mv_Viterbi(obs, hmm, cst, sat = False)

In [10]:
opt_state

['b', 'b', 'c', 'a', 'c', 'a', 'c']