# Probability distribution
Assuming the coins are equally likely:

P(heads) = $\frac{1}{2}\theta_1 + \frac{1}{2}\theta_2$

## E-step

First, we should derive an expression for $q(h | v) = P(h | v, \theta^{t-1})$

For each possible value of $h \in {1, 2}$, $q(h |v) \propto \binom{v_n}{v_k} (\theta^{t-1}_{h})^{v_k} (1 - \theta^{t-1}_{h})^{v_n - v_k}$ where $v_n$ is the number of throws observed, $v_k$ is the number of heads observed, and $\theta^{t-1}_{h}$ is the current probability of heads for the coin we denote by the current value of $h$. To arrive at the exact probability, we normalize by dividing by the sum of both values of $h$. 

In [20]:
from scipy.misc import comb

def e_step(theta, throws, heads):
    """
    throws: a list of the number of throws thrown using the coin chosen at each iteration (v_n)
    heads: the number of heads observed in each iteration (v_k)
    theta: the current value of theta (\theta^{t-1})
    """
    q = []
    for n, k in zip(throws, heads):
        n_choose_k = comb(n, k)
        probs = [ n_choose_k * (p ** k) * ((1 - p) ** (n - k)) for p in theta]
        norm = sum(probs)
        probs = [ q / norm for q in probs ] 
        q.append(probs)
        
    return q
    

In [24]:
throws = [41, 43, 23, 23, 1, 23, 36, 37, 2, 131, 5, 29, 13, 47, 10, 58, 15, 14, 100, 113]
heads = [14, 33, 19, 10, 0, 17, 24, 17, 1, 36, 5, 6, 5, 13, 4, 35, 5, 5, 74, 34]

print(e_step([0.6, 0.4], throws, heads))

for x in zip(*e_step([0.6, 0.4], throws, heads)):
    print(x)

[[0.0051119646305962871, 0.99488803536940373], [0.99991090315086062, 8.9096849139461371e-05], [0.99772154495217003, 0.0022784550478299678], [0.22857142857142859, 0.77142857142857135], [0.40000000000000002, 0.59999999999999998], [0.98857110968498008, 0.011428890315019964], [0.99235160222356256, 0.007648397776437491], [0.22857142857142859, 0.77142857142857135], [0.5, 0.5], [4.0795824582382401e-11, 0.99999999995920419], [0.88363636363636366, 0.11636363636363642], [0.0010139301291582788, 0.99898606987084171], [0.22857142857142865, 0.77142857142857135], [0.00020044558672603654, 0.99979955441327395], [0.30769230769230776, 0.69230769230769229], [0.99235160222356245, 0.0076483977764374893], [0.11636363636363641, 0.88363636363636355], [0.16494845360824748, 0.8350515463917525], [0.99999999647126081, 3.5287392148869218e-09], [1.1909494750432793e-08, 0.99999998809050517]]
(0.0051119646305962871, 0.99991090315086062, 0.99772154495217003, 0.22857142857142859, 0.40000000000000002, 0.98857110968498008

## M-step

To compute the M-step, we should start by writing down an expression for the energy. By definition, the energy is $ \sum_{n=1}^{N} \langle \log p(h^n, v^n | \theta) \rangle_{q(h^n | v^n)} $.

Since $q$ is discrete, we can think about it as $ \sum_{n=1}^{N} \sum_{i=1}^{H} \log p(v^n | h^n=i, \theta) q(h^n=i | v^n) $, where $H$ is the total number of possibilities for $h$.

In this case, the probability is binomial, $p(v^n | h^n=i, \theta) = \binom{v^n_n}{v^n_k} (\theta^{t}_{i})^{v^n_k} (1 - \theta^{t}_{i})^{v^n_n - v^n_k} $, leaving us looking for $\theta^t$ that maximizes $ \sum_{i=1}^{H} \sum_{n=1}^{N} q(h^n=i | v^n) \log \Big( \binom{v^n_n}{v^n_k} (\theta^{t}_{i})^{v^n_k} (1 - \theta^{t}_{i})^{v^n_n - v^n_k} \Big) $. Note the switch in order of summation, to consider each $\theta_i$ separately, since the two are independent.



To optimze, we can differentiate and set to zero (for a particular $\theta_i$):
\begin{align*}
0 &= \frac{d}{d\theta^t} \Big[ \sum_{n=1}^{N} q(h^n=i | v^n) \log \Big( \binom{v^n_n}{v^n_k} (\theta^{t}_{i})^{v^n_k} (1 - \theta^{t}_{i})^{v^n_n - v^n_k} \Big) \Big] \\
&= \sum_{n=1}^{N} q(h^n=i | v^n) \frac{d}{d\theta^t} \Big[ \log \Big( \binom{v^n_n}{v^n_k} (\theta^{t}_{i})^{v^n_k} (1 - \theta^{t}_{i})^{v^n_n - v^n_k} \Big) \Big] \\
&= \sum_{n=1}^{N} q(h^n=i | v^n) \frac{d}{d\theta^t} \Big[ \log \Big( \binom{v^n_n}{v^n_k} \Big) + (v^n_k) \log \Big( (\theta^{t}_{i}) \Big) + (v^n_n - v^n_k) \log \Big( (1 - \theta^{t}_{i}) \Big) \Big] \\
&= \sum_{n=1}^{N} q(h^n=i | v^n) \Big[ \frac{v^n_k}{\theta^{t}_{i}} - \frac{v^n_n - v^n_k}{1 - \theta^{t}_{i}} \Big] \\
&= \sum_{n=1}^{N} q(h^n=i | v^n) \frac{v^n_k}{\theta^{t}_{i}} - \sum_{n=1}^{N} q(h^n=i | v^n)  \frac{v^n_n - v^n_k}{1 - \theta^{t}_{i}} \\
\frac{1}{\theta^{t}_{i}} \sum_{n=1}^{N} q(h^n=i | v^n) (v^n_k) &= \frac{1}{1 - \theta^{t}_{i}}\sum_{n=1}^{N} q(h^n=i | v^n) (v^n_n - v^n_k) \\
\sum_{n=1}^{N} q(h^n=i | v^n) (v^n_k) &= \theta^{t}_{i} \Big( \sum_{n=1}^{N} q(h^n=i | v^n) (v^n_n - v^n_k) + \sum_{n=1}^{N} q(h^n=i | v^n) (v^n_k) \Big) \\
\theta^{t}_{i} &= \frac{\sum_{n=1}^{N} q(h^n=i | v^n) (v^n_k)}{\sum_{n=1}^{N} q(h^n=i | v^n) (v^n_n - v^n_k) + \sum_{n=1}^{N} q(h^n=i | v^n) (v^n_k)} \\
\theta^{t}_{i} &= \frac{\sum_{n=1}^{N} q(h^n=i | v^n) (v^n_k)}{\sum_{n=1}^{N} q(h^n=i | v^n) (v^n_n)}
\end{align*}

In other words, the $\theta^{t}_{i}$ that optimizes the given $q(h | v)$ is the ratio between a weighted sum of heads and a weighted sum of tosses, each trial weighted by the probability the current coin was used for it under $q$.

In [17]:
def m_step(q, throws, heads):
    theta = []
    for q_for_theta in zip(*q):
        numerator = sum([v_k * q_h for v_k, q_h in zip(heads, q_for_theta)])
        denominator = sum([v_n * q_h for v_n, q_h in zip(throws, q_for_theta)])
        theta.append(numerator / denominator)
        
    return theta


In [25]:
m_step(e_step([0.51, 0.49], throws, heads), throws, heads)

[0.58252000809913906, 0.3806030183001759]

In [32]:
def em(theta_zero, data=(throws, heads), max_t=1e3, min_diff=1e-7):
    theta = theta_zero
    for t in range(int(max_t)):
        q = e_step(theta, *data)
        next_theta = m_step(q, *data)
        if sum([abs(n - t) for n, t in zip(next_theta, theta)]) < min_diff:
            print('Converged after {t} iterations, breaking'.format(t=t + 1))
            return next_theta
            
        theta = next_theta
            
    return theta
    

In [33]:
em([0.51, 0.49])

Converged after 10 iterations, breaking


[0.71253192057209336, 0.31411465036724373]

In [34]:
em([0.4, 0.6])

Converged after 9 iterations, breaking


[0.31411465064967259, 0.71253192173588253]

In [40]:
em([0.25, 0.25])

Converged after 2 iterations, breaking


[0.46727748691099474, 0.46727748691099474]

# Probability distribution given bias
If we're more likely to pick a particular coin:

P(heads) = $\frac{\phi_1}{2}\theta_1 + \frac{\phi_2}{2}\theta_2$, where $\phi$ is the mixture weight.

## E-step
To re-derive the E-step, we simply weight by the previous value of $\phi$:

$q(h |v) \propto \phi^{t-1}_h \binom{v_n}{v_k} (\theta^{t-1}_{h})^{v_k} (1 - \theta^{t-1}_{h})^{v_n - v_k}$ 

In [35]:

def e_step_with_bias(theta, phi, throws, heads):
    """
    throws: a list of the number of throws thrown using the coin chosen at each iteration (v_n)
    heads: the number of heads observed in each iteration (v_k)
    theta: the current value of theta (\theta^{t-1})
    phi: the current value of phi (\phi^{t-1}), a vector the same length as theta
    """
    q = []
    for n, k in zip(throws, heads):
        n_choose_k = comb(n, k)
        probs = [ ph * n_choose_k * (p ** k) * ((1 - p) ** (n - k)) for p, ph in zip(theta, phi)]
        norm = sum(probs)
        probs = [ q / norm for q in probs ] 
        q.append(probs)
        
    return q

## M-step

To work out the M-step, we note the probability changes, as it is also multiplied by a factor of $\phi_h$. Note that the derivation for $\theta$ doesn't change, as like the binomial factor, $\frac{d\phi_h}{d\theta} = 0$. Deriving for $\phi$:

\begin{align*}
0 &= \frac{d}{d\phi^t} \Big[  \sum_{i=1}^{H} \sum_{n=1}^{N} q(h^n=i | v^n) \log \Big( \phi^t_i \binom{v^n_n}{v^n_k} (\theta^{t}_{i})^{v^n_k} (1 - \theta^{t}_{i})^{v^n_n - v^n_k} \Big) \Big] \\
&=  \sum_{i=1}^{H} \sum_{n=1}^{N} q(h^n=i | v^n) \frac{d}{d\phi^t} \Big[ \log \Big( \phi^t_i \binom{v^n_n}{v^n_k} (\theta^{t}_{i})^{v^n_k} (1 - \theta^{t}_{i})^{v^n_n - v^n_k} \Big) \Big] \\
&= \sum_{i=1}^{H} \sum_{n=1}^{N} q(h^n=i | v^n) \frac{d}{d\phi^t} \Big[ \log (\phi^t_i) +\log \Big( \binom{v^n_n}{v^n_k} \Big) + (v^n_k) \log \Big( (\theta^{t}_{i}) \Big) + (v^n_n - v^n_k) \log \Big( (1 - \theta^{t}_{i}) \Big) \Big] \\
&= \sum_{i=1}^{H} \sum_{n=1}^{N} q(h^n=i | v^n) \Big[ \frac{1}{\phi^{t}_{i}}\Big] \\
&= \sum_{n=1}^{N} q(h^n=1 | v^n) \Big[ \frac{1}{\phi^{t}_{1}}\Big] + \sum_{n=1}^{N} q(h^n=2 | v^n) \Big[ \frac{1}{\phi^{t}_{2}}\Big]  \\
&= \frac{1}{\phi^{t}_{1}} \sum_{n=1}^{N} q(h^n=1 | v^n) + \frac{1}{\phi^{t}_{2}}\sum_{n=1}^{N} q(h^n=2 | v^n) \\
&\text{Recall that } \phi_1 + \phi_2 = 1: \\
&= \frac{1}{\phi^{t}_{1}} \sum_{n=1}^{N} q(h^n=1 | v^n) + \frac{1}{1 - \phi^{t}_{1}}\sum_{n=1}^{N} q(h^n=2 | v^n) \\
&= (1 - \phi^{t}_{1}) \sum_{n=1}^{N} q(h^n=1 | v^n) + (\phi^{t}_{1}) \sum_{n=1}^{N} q(h^n=2 | v^n) \\
\sum_{n=1}^{N} q(h^n=1 | v^n) &= \phi^{t}_{1} \sum_{n=1}^{N} q(h^n=1 | v^n) - \phi^{t}_{1} \sum_{n=1}^{N} q(h^n=2 | v^n) \\
\end{align*}