**Maximum likelihood estimatation from observed and unobserved data**

You are given a bag containing red and blue coins. All the red coins have the same probability of heads. All the blue coins have the same probability of heads (possibly different from that of the red coins).

Your task is to estimate the proportion of red coins in the bag and the probability of heads for both the red and the blue coin.

In [73]:
import ipywidgets as widgets
prob_red = widgets.FloatSlider(min=0.0, max=1.0, description='prob_red')
prob_head_red = widgets.FloatSlider(min=0.0, max=1.0, description='head_red')
prob_head_blue = widgets.FloatSlider(min=0.0, max=1.0, description='head_blue')
display(prob_red, prob_head_red, prob_head_blue)

FloatSlider(value=0.0, description='prob_red', max=1.0)

FloatSlider(value=0.0, description='head_red', max=1.0)

FloatSlider(value=0.0, description='head_blue', max=1.0)

Use these widgets to control the model.

In [74]:
import random
def choose_coin():
    return 'R' if random.random() < prob_red.value else 'B'

def flip_coin(coin):
    uar = random.random()
    if coin == 'R':
        if uar < prob_head_red.value:
            return 'H'
    elif uar < prob_head_blue.value:
        return 'H'
    return 'T'

def flip_random_coin_n_times(n, hidden=False):
    coin = choose_coin()
    return ('_' if hidden else coin, ''.join([flip_coin(coin) for i in range(n)]))

def flip_m_random_coins_n_times(m, n, hidden=False):
    return [flip_random_coin_n_times(n, hidden) for i in range(m)]

Use the above methods to sample from the model. The optional parameter 'hidden' controls whether the colour of the coin is observed in the samples.

In [75]:
flip_m_random_coins_n_times(5, 100)

[('B',
  'THTHHTTTTTHHHTTTHHTHTHHTHHTHTHHTHTTTHTTHHHHHHTHTTHHHHTTTHTHTTTTTTTHTTHTHTTTHTTHTTTTHHTTTTTTTHTTTHHTT'),
 ('R',
  'HTTTTTTHHTHHHTHHHHHHTHHTTTTTTTTHTHHHHTTHHTHHHTTTTTHHHHHTHHTHTTHHTHHHHHHHHHHHHTTTHTHHHHHTTTHHHTHHTHHH'),
 ('R',
  'HHTTHHTHHHHHHHHHTHHTTHTHHTTTTHHHHTHTHHTHTHTHHHHHTHTTTHHTTHHHHHHHTHHHTHHHHHHHHHHHHHHTTHHHHHHHHTHTHTHH'),
 ('B',
  'TTHTHTTHHHTTTTTTTTTTTTTTHTTTHTTHHTTTTHTHHTTHHHTHHHHTTHTHTHTTTTTHTHHTHTTHHTTTTTTTHTHHTHTTHHTHTHTHTTTH'),
 ('R',
  'THHHTHHHTHHHTHHTHHTHHHHTHHHHHHTHHTHTTHHTHTHTTTHHTTTHHHHHTHHTTTHHHTHHHHTHTTHTHTHHHHHHHTHHHHTHTHHHHHHH')]

In [76]:
flip_m_random_coins_n_times(5, 100, hidden=True)

[('_',
  'THTTTTTTHTHTTHTTTTTTTHTTHHTTTTTTHTTTTTTTHTHHTHHHHHTTHHHTTTHHTHTTHHTHHHHTTHTTHHTTTTTTHHHTTTTHHTTHTTTH'),
 ('_',
  'TTHHTTTTHTTHTTTHHTTTHHHHTHTTHTHTTTTTTTTTTTTTHTHHTTTTTTTTHHHTHTTTTTTTTTTTTTHTHTTTHTTTTTTHHHTHTTTHHHHT'),
 ('_',
  'HTTHHHTHHTTTHHTTHTTTTHTTHTTTTHTHTTHHHTTTTTHTHHHHHHTHTHTHTHTTTTHHTHTTTTTTHTTTTTHHTTTHTTTTHTTHHTTHTTHH'),
 ('_',
  'THTHTTTTHTTHHHTHTTTHTTTTTHTTTHTTTHHHHHHTTHTTTHHTTHTTTHTHTTTHTHHHHHTTTHTHHTTHTTHHHTTTHHTTTTHTHHHHHHHT'),
 ('_',
  'HTTHHTHTHHHHHTHTHHHTHHTTHHHHHHHTTTHHHHHTHHTTTTHHHHTHHHHHTHHHHHTTHHHHTHTTTTTTHHTHTHHTHHHHTHHHTHHHHHTT')]

**TASK 1** Implement the following two functions to estimate parameters for the model in the observed case. Splitting the work into two separate functions will simplify things for the next task. 

* How could you measure the error in your estimates?
* How does the error decrease with the sample size?
* If you were only allowed to flip coins a total of N times how would you choose m (the number of coins) and n the number of times to flip each coin? Why?

In [77]:
samples = flip_m_random_coins_n_times(2, 10)

In [78]:
samples

[('B', 'HTHTTHTHTT'), ('B', 'THHHHHHHTT')]

In [79]:
def compute_sufficient_statistics(samples):
    total_count = len(samples)
    sample_len = len(samples[0][1])
    red_count = sum([1 for sample in samples if sample[0] == 'R' ])
    blue_count = total_count - red_count
    red_head_count = sum([sum([1 for coin in sample[1] if coin == 'H']) for sample in samples if sample[0] == 'R' ])
    blue_head_count = sum([sum([1 for coin in sample[1] if coin == 'H']) for sample in samples if sample[0] == 'B' ])
    return total_count, red_count, blue_count, red_head_count, blue_head_count, sample_len
    
def mle(sufficient_statistics):
    prob_red = sufficient_statistics[1] / sufficient_statistics[0]
    prob_red_head = sufficient_statistics[3] / (sufficient_statistics[-1] * sufficient_statistics[1])
    prob_blue_head = sufficient_statistics[4] / (sufficient_statistics[-1] * sufficient_statistics[2])
    return prob_red, prob_red_head, prob_blue_head

In [80]:
mle(compute_sufficient_statistics(flip_m_random_coins_n_times(1000, 1000)))

(0.395, 0.7003493670886076, 0.40068595041322314)

**TASK 2** Given a sample from a single coin whose colour is unobserved, estimate the posterior probability that the coin is red, given some estimates of the model parameters.

* If you pass in the true model parameters (e.g. prob_red.value, prob_head_red.value and prob_head_blue.value), how quickly does the posterior change? Use the plot_distribution function to view this.
* How does this depend on the model parameters?

In [247]:
def compute_posterior_prob_red(sample, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue):
    total_count = len(sample[1])
    heads_count = sample[1].count('H')
    tails_count = total_count - heads_count
    joint_red = estimate_prob_red * estimate_prob_head_red**heads_count * (1-estimate_prob_head_red)**tails_count
    joint_blue = (1 - estimate_prob_red) * estimate_prob_head_blue**heads_count * (1-estimate_prob_head_blue)**tails_count
    return joint_red / (joint_red + joint_blue)

In [230]:
sample = flip_random_coin_n_times(1000)
statistics = mle(compute_sufficient_statistics(flip_m_random_coins_n_times(1000, 1000)))
print(statistics)
compute_posterior_prob_red(sample, *statistics)

(0.289, 0.300878892733564, 0.4999690576652602)


1.0

**TASK 3** Reusing your code from Tasks 1 and 2, implement expectation maximization algorithm to find a (locally optimal) solution to the parameters when the colour of the coins is not observed.

In [303]:
def compute_expected_statistics(samples, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue):
    total_count = len(samples)
    sample_len = len(samples[0][1])
    
#     red_count = estimate_prob_red * total_count
#     blue_count = (1 - estimate_prob_red) * total_count
#     red_head_count = estimate_prob_red * sum(sample[1].count('H') for sample in samples)
#     blue_head_count = (1 - estimate_prob_red) * sum(sample[1].count('H') for sample in samples)
    post_prob = 0
    red_head_count = 0
    blue_head_count = 0
    red_count = 0
    blue_count = 0
    for sample in samples:
        sample_post_prop = compute_posterior_prob_red(sample, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue)
#         print('pp_init', estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue)
        post_prob += sample_post_prop
        sample_red_head_count = sample_post_prop * sample[1].count('H')
        red_head_count += sample_red_head_count
        sample_blue_head_count = (1 - sample_post_prop) * sample[1].count('H')
        blue_head_count += sample_blue_head_count
#         print('pp', sample_post_prop)
        red_count += sample_post_prop
        blue_count += (1 - sample_post_prop)
#     print('countsr', red_count, blue_count)
#     print('countsr', red_count, blue_count)
    return total_count, red_count, blue_count, red_head_count, blue_head_count, sample_len
#     assert 'Compute the sufficient statistics for this sample given these parameter estimates.'
    
def expectation_maximization(samples, iterations, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue):
    step = 0
    while True:
#         for sample in samples:
        expected_statistics = compute_expected_statistics(samples, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue)
        estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue = mle(expected_statistics)
        step += 1
#         print(step)
#         print(estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue)
        if step >= iterations:
            break
    return estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue
#     assert 'Compute the mle parameter estimates for the model from a sample without labels. '

In [322]:
samples = flip_m_random_coins_n_times(100, 100)
estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue = random.random(), random.random(), random.random()
print(estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue)
expectation_maximization(samples, 10000, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue)

0.8375530011124886 0.8728934211664945 0.8208035753530862


(0.2999181720694818, 0.6071680888003922, 0.39952366353355084)