In [2]:
import numpy as np
import warnings
from munch import Munch
import itertools
from mv_Viterbi import mv_Viterbi
from cst_aggregate import cst_aggregate

# UnSplit Version of Mediation Variables

The original formulation where we split a constraint across time added some complexity and, imporntantly, made it very hard to condition on the constraint being unsatsifies $C = 0$, since in the original $C=1$ was equivalent to setting every $C_t = 1$. In this version, we sidestep this issue by not splitting up $C$. For more info, refer to the notes on Overleaf.

### The Model

Same as the other notebook, the HMM here is designed specifically to demonstrate the effects of constraints. We have a three state HMM $a,b,c$ with binary emissions $A,C$. As the names would suggest, $a$ and $c$ are very likely to emit $A$ and $C$ respecively, while $b$ emits both with equal probability. The intial distribution is also uniform over all 3 states. Here's a summary:

1. Uniform transition matrix
2. Uniform initial distribution
3. Binary emissions with emission matrix. Rows $a,b,c,$ top-to-bottom, columns $A,C$ left-right: $$\begin{bmatrix} .8 & .2\\ .5 & .5\\ .2 & .8 \end{bmatrix}$$



### The precedence constraint

We'll be handling a precendence constraint: $a$ must happen before $c$. The mediatory variable is a 2-dimensional binary vector $r_t = (m_1^t, m_2^t)$. $m_1^t$ tracks whether state $a$ has been visited yet. $m_2^t$ then tracks if, at any time up to $t$, state $c$ has occured before $a$ was visited. 

# Inference: Unconstrained vs. Unconstrained

In the unconstrained case, given an observation sequence of $A,A,C,\cdots[]$ the MAP will simply be the corresponding hidden sequence $a,a,c,\cdots$. $b$ will never be encountered in the unconstrained MAP.

However, in the constrained case, say the we know the constraint is satisfied. Then if we encounter a few initial $C$'s, our constrain forces us to choose $b$ which is allowed and more likely to emite $C$ than $a$. For example, $C,C,A,A,\cdots$ would give rise to  $b,b,a,a,\cdots$, since we cannot have $c$ happen before $a$. Note that if the intial sequence of $C$'s is longer than 3, then we'll be incentivized to eat the cost of starting at $a$ so we can unlock $c$ for the subsequent states.

On the other hand, saya we know the constraint is NOT satisfied, equivalent to knowing that $c$ happens before $a$. Then the situation is symmetric, except now we would infer $b$'s if we encounter a small intiial sequence of $a$'s.

# Interesting Points

Already, we demonstrated that with intermediate variables, we're able to run the Viterbi algorithm on the augmented model with small additional overhead. If you look at the code and/or the equations, we needed to track the states of a binary 2D vector, expanding the number of values tracked in Viterbi by a factor of 4. However, we notably did not need to blow up the transition matrix, and the "transitions"/"emissions" of these auxillary variables were incorporates as 0-1 weights.

Also, note that now it's trivial to do constrained inference in the case where we know the constrain is NOT satisfied. This amounts to merely setting the constraint emission to 0 and runnign Vitberi, as demonstrated in an example later on.

### Create the HMM

In [2]:
hidden_states = ["a", "b", "c"]
emit_states = ["A", "C"]
hmm_transition = {}
for i in hidden_states:
    for j in hidden_states:
        hmm_transition[i, j] = 1 / 3

emit_mat = np.array([[0.8, 0.2], [0.5, 0.5], [0.2, 0.8]])
hmm_emit = {}
for i in range(3):
    for j in range(2):
        hmm_emit[hidden_states[i], emit_states[j]] = emit_mat[i, j].item()

hmm_startprob = {}
for i in hidden_states:
    hmm_startprob[i] = 1 / 3

hmm = Munch(
    states=hidden_states,
    emits=emit_states,
    tprob=hmm_transition,
    eprob=hmm_emit,
    initprob=hmm_startprob,
)

### Create the Constraint

In [3]:
def update_fun(r, k, r_past):
    """
    m1^t = tau^t_a = a OR tau^{t-1}_a #tracks if state a has happend yet
    m2^t = [1- (1 - tau^t_a) AND c)] AND m2^{t-1} = [tau^t_a or (1 - c)] AND m2^{t-1} #tracks if the arrival time of a is before c
    k is the current state
    r is the auxillary state. a 2-tuple. r = (m1,m2)
    """
    m1 = (k == "a") or r_past[0]
    m2 = (m1 or (not k == "c")) and r_past[1]

    return int(r == (m1, m2))


def init_fun(k, r):
    """
    initial "prob" of r = (m1,m2) from k. is just indicator
    """
    m1 = k == "a"
    m2 = not k == "c"

    return int(r == (m1, m2))


def cst_fun(r, sat):
    """
    Constraint is a boolean emissions of the final auxillary state. In this case, is just m1^T: ie. tau_a >= tau_b for all time.
    """
    return int(r[1] == sat)

In [4]:
precedence_cst = Munch(
    name="a occurs before c",
    aux_size=2,
    update_fun=update_fun,
    init_fun=init_fun,
    cst_fun=cst_fun,
)

### Inference when the Constraint is Satisfied

Here, we constrain $C=1$: $a$ must happen before $c$. As predicted, when encountering an initial sequence of $C$'s, our model choose $b$ since $c$ is not allowed and $b$ has a higher chance of emitting $C$. Provided the initial number of $C$'s is at most 2, we'll see this behavior. We can increase the admissable length of $b$'s by decreasing the emission probabilities $a,A$ and $c,C$ if we want.

In [5]:
obs = ["C", "C", "A", "C", "A", "C"]

In [6]:
opt_aug, opt_state = mv_Viterbi(obs, hmm, precedence_cst, sat=True)

In [7]:
opt_state

['b', 'b', 'a', 'c', 'a', 'c']

### Inference when the COnstraint is NOT Satsified

Now, we observe $C= 0$: that the constrain is not satisifed. It's logical negation is just that $c$ happens before $a$, and the inferene situation is symmetric. We see that encountering a small initial sequence of $A$'s makes us choose $b$ for the same reasons as above.

In [8]:
obs = ["A", "A", "C", "A", "C", "A", "C"]

In [9]:
opt_aug, opt_state = mv_Viterbi(obs, hmm, precedence_cst, sat=False)

In [10]:
opt_state

['b', 'b', 'c', 'a', 'c', 'a', 'c']

# Occurence Constraint

Now, we create anothe constraint class that enforce that state $b$ must be visited at some point. This is equivalent to replacing just one of $a$ or $c$ in the unconstrained MAP with $b$, at any time point.

In [11]:
def update_fun2(r, k, r_past):
    """
    m1 = = tau_b or b . tracks if b has occured
    """
    m1 = (k == "b") or r_past[0]

    return int(r == (m1,))


def init_fun2(k, r):
    """
    initial "prob" of r = m1,m2 from k. is just indicator
    """
    m1 = k == "b"

    return int(r == (m1,))


def cst_fun2(r, sat):
    """
    Constraint is a boolean emissions of the final auxillary state. In this case
    """

    return int(r[0] == sat)

In [12]:
occurence_cst = Munch(
    name="b must occur",
    aux_size=1,
    update_fun=update_fun2,
    init_fun=init_fun2,
    cst_fun=cst_fun2,
)

In [13]:
obs = ["C", "C", "A", "C", "A", "C"]

In [14]:
opt_aug, opt_state = mv_Viterbi(obs, hmm, occurence_cst, sat=True)

In [15]:
opt_state

['b', 'c', 'a', 'c', 'a', 'c']

## Occurent Constraint is False

If we condition on the constraint being false, this is equivalent to "$b$ is never visited". Since unconstrained inference will never return $b$, setting the constriant to be False will give the same answer as unconstrained inference.

In [17]:
obs = ["C", "C", "A", "C", "A", "C"]

In [18]:
opt_aug, opt_state = mv_Viterbi(obs, hmm, occurence_cst, sat=False)
opt_state

['c', 'c', 'a', 'c', 'a', 'c']

# Conditioning on Multiple Constraints and Their Values

Now, we'll introduce both the precendence constraint "$a$ happens before $c$" and "$b$ must happen at some point" into our model. Again, these are modeled as binary emissions, so we can play with their truth configurations.

In [34]:
cst_list = [precedence_cst, occurence_cst]
combined_cst = cst_aggregate(cst_list)
combined_cst.name

['a occurs before c', 'b must occur']

# Both True

First, we assume both constraints are true. Note that the below observation sequence is chosen so that the precendence constraint already makes $b$ appear first, so the occurence constraint is satsified automatically. Therefore, the answer should be the same as just conditioning on the precendence constraint

In [25]:
obs = ["C", "C", "A", "C", "A", "C"]

In [26]:
opt_aug, opt_state = mv_Viterbi(obs, hmm, combined_cst, sat=(True, True))

In [27]:
opt_state

['b', 'b', 'a', 'c', 'a', 'c']

### Precendence True, Occurence False

Now here's an interesting scenario. The occurence constraint being unsatisfied is equivalent to $b$ never occuring. Now, when the precendence constraint kicks in, we can only choose $a$ or $c$. This means that any initial sequence of $C$ emissions is forced to return $a$, as opposed to $b$ if we were just enforcing the precendence constraint by itself.

In [32]:
obs = ["C", "C", "A", "C", "A", "C"]

In [33]:
opt_aug, opt_state = mv_Viterbi(obs, hmm, combined_cst, sat=(True, False))
opt_state

['a', 'c', 'a', 'c', 'a', 'c']