**Maximum likelihood estimatation from observed and unobserved data**

You are given a bag containing red and blue coins. All the red coins have the same probability of heads. All the blue coins have the same probability of heads (possibly different from that of the red coins).

Your task is to estimate the proportion of red coins in the bag and the probability of heads for both the red and the blue coin.

In [36]:
import ipywidgets as widgets

prob_red = widgets.FloatSlider(min=0.3, max=0.99, description='prob_red')
prob_head_red = widgets.FloatSlider(min=0.1, max=0.99, description='head_red')
prob_head_blue = widgets.FloatSlider(min=0.1, max=0.99, description='head_blue')
display(prob_red, prob_head_red, prob_head_blue)

FloatSlider(value=0.3, description='prob_red', max=0.99, min=0.3)

FloatSlider(value=0.1, description='head_red', max=0.99, min=0.1)

FloatSlider(value=0.1, description='head_blue', max=0.99, min=0.1)

Use these widgets to control the model.

In [2]:
import random
def choose_coin():
    return 'R' if random.random() < prob_red.value else 'B'

def flip_coin(coin):
    uar = random.random()
    if coin == 'R':
        if uar < prob_head_red.value:
            return 'H'
    elif uar < prob_head_blue.value:
        return 'H'
    return 'T'

def flip_random_coin_n_times(n, hidden=False):
    coin = choose_coin()
    return ('_' if hidden else coin, ''.join([flip_coin(coin) for i in range(n)]))

def flip_m_random_coins_n_times(m, n, hidden=False):
    return [flip_random_coin_n_times(n, hidden) for i in range(m)]

Use the above methods to sample from the model. The optional parameter 'hidden' controls whether the colour of the coin is observed in the samples.

In [3]:
flip_m_random_coins_n_times(5, 100)

[('B',
  'TTTTTTTTTTTTTTTTTTTTTTTTTTTTTHTTTTTTTTTTTTTTHTTTHTTTTTHTTTTTTTTTTTTHTTTHHTTTTTTTHTHHTTTTTHTTTTTHTTTT'),
 ('R',
  'TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTHTHTTTHTTTTHTTTTTTTTTTTTTTTTTTTTTTTTHTTTTTTTTTTTTHTTTTTTTTTTHTTTHTTH'),
 ('B',
  'HTTTTTTHTTTTTTTTTTTTTTTHTTTTTTTTTTTHTTTTTHTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTHTTTTTTTTTTTTTTTTHTTTTT'),
 ('B',
  'TTTTHTTHTTTTTTTTTTHTTTTHTTTTTTTTTTTTTTTHTTTTTTTTTTTHTHTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTHTTTTTHT'),
 ('B',
  'THTTTHTTTHTTTTTTTTTTTHTTHTTTTTTTTTTTTTTTTTTTTHTTTTTTTTTHHHTTTTTTTTTTHTTHTTTTTTHTTHTTTTTTTTTTTTTTHTHT')]

In [4]:
flip_m_random_coins_n_times(5, 100, hidden=True)

[('_',
  'TTTTHTTTTTTTHTTTTTHTTTTTTTHTTHHTTTTTTTHTTHTTTTTTTTHTTTTTTTTTTTTTTTTTTTTTHHHTTTTTTTTTTHTTHTTTTTHTTTTT'),
 ('_',
  'TTTTTTTHTTTTTHTTTHHTTTTTTTTHTTTTTTTTTTTHHTHHTTTTTTTTTTTTHHTTTTTTTTTTTTTTTTHTTTTTTTTTTTTTTTTTTTTTTTTT'),
 ('_',
  'TTTTHTTTTTTTTTTTTTTTTHTTTTTTTTHHHTTTTTTTTTTTTTTTHHTTTTTTTTTTTTTTTTTTTTTTTTTTTTHHTTTTTTTTTTTHTTTTTHTT'),
 ('_',
  'TTTTTTTTTTTTTHTTTTTTTTTTTTTTTTTTTTTTHTTHTTTTTTTTHTTTTHTTTTTHTTTTTTHTTTTTTTTHTTTTTTTTTTHTTTTTTTTTTTTT'),
 ('_',
  'THTTTTTHHHTTTTTTHHHTTTTTTTTTTTTTTTTTTTTTTTTTTHTTTHTTTTTTTHTTTTTTTTTTTTTTTTTTTTTTTTTTTHTTTTTHTTTHTTTH')]

**TASK 1** Implement the following two functions to estimate parameters for the model in the observed case. Splitting the work into two separate functions will simplify things for the next task. 

* How could you measure the error in your estimates?
* How does the error decrease with the sample size?
* If you were only allowed to flip coins a total of N times how would you choose m (the number of coins) and n the number of times to flip each coin? Why?

In [33]:
def compute_sufficient_statistics(samples):
    'Implement me to compute the sufficient statistics for the model from the samples.'
    count_red = 0
    count_blue = 0
    count_red_head = 0
    cond_blue_head = 0
    for elem in samples:
        if elem[0] == 'R':
            count_red += len(elem[1])
            count_red_head += elem[1].count('H')
        else:
            count_blue += len(elem[1])
            cond_blue_head += elem[1].count('H')

    return (count_red, count_blue, count_red_head, cond_blue_head)

def mle(sufficient_statistics):
    'Implement me to compute mle parameter estimates given sufficient statistics.'
    count_red, count_blue, count_red_head, count_blue_head = sufficient_statistics
    N = count_red + count_blue
    prob_red = count_red / N
    prob_head_red =  count_red_head / count_red
    prob_head_blue = count_blue_head / count_blue
    return prob_red, prob_head_red, prob_head_blue
    
    

**TASK 2** Given a sample from a single coin whose colour is unobserved, estimate the posterior probability that the coin is red, given some estimates of the model parameters.

* If you pass in the true model parameters (e.g. prob_red.value, prob_head_red.value and prob_head_blue.value), how quickly does the posterior change? Use the plot_distribution function to view this.
* How does this depend on the model parameters?

In [55]:
def compute_posterior_prob_red(sample, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue):
    'Implement me to compute posterior probability that the sample came from the red coin.'
#     p_x_given_r = 1
#     p_x_given_b = 1
#     for elem in sample:
#         if elem == 'T':
#             p_x_given_r *= 1 - estimate_prob_head_red
#             p_x_given_b *= 1 - estimate_prob_head_blue
#         else:
#             p_x_given_r *= estimate_prob_head_red
#             p_x_given_b *= estimate_prob_head_blue
            
    p_x_given_r = (1 - estimate_prob_head_red) ** sample.count('T') * estimate_prob_head_red ** sample.count('H')
    p_x_given_b = (1 - estimate_prob_head_blue) ** sample.count('T') * estimate_prob_head_blue** sample.count('H')

    posterior_red = p_x_given_r * estimate_prob_red
    posterior_blue = p_x_given_b * (1 - estimate_prob_red)
    posterior_red = posterior_red / (posterior_red + posterior_blue)
    return posterior_red

In [38]:
sample = flip_m_random_coins_n_times(100, 100, hidden=False)
estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue = mle(compute_sufficient_statistics(sample))

print(estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue)

0.39 0.29205128205128206 0.20278688524590163


This is nearly the same what we can see on widgets

**TASK 3** Reusing your code from Tasks 1 and 2, implement expectation maximization algorithm to find a (locally optimal) solution to the parameters when the colour of the coins is not observed.

In [61]:
def compute_expected_statistics(samples, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue):
    'Compute the sufficient statistics for this sample given these parameter estimates.'
    posterior_count_head_red = 0
    posterior_count_head_blue = 0
    posterior_count_red = 0
    posterior_count_blue = 0
    for sample in samples:
        
        posterior_prob = compute_posterior_prob_red(sample[1], estimate_prob_red, 
                                                    estimate_prob_head_red, estimate_prob_head_blue)
        
        posterior_count_head_red += posterior_prob * sample[1].count('H')
        posterior_count_head_blue +=  (1 - posterior_prob) * sample[1].count('H')
        
        posterior_count_red += posterior_prob * len(sample[1])
        posterior_count_blue += (1 - posterior_prob) * len(sample[1])
        
    return posterior_count_red, posterior_count_blue, posterior_count_head_red, posterior_count_head_blue
    
    
def expectation_maximization(samples, iterations, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue):
    'Compute the mle parameter estimates for the model from a sample without labels. '
    
    for _ in range(iterations):
        # expectation
        count_red, count_blue, count_head_red, count_head_blue = compute_expected_statistics(samples, 
                                                                                             estimate_prob_red, 
                                                                                             estimate_prob_head_red, 
                                                                                             estimate_prob_head_blue)
        
        # maximization
        estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue = mle((count_red, count_blue, count_head_red, count_head_blue))

    return estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue
        
    

Estimation with visible labels

In [63]:
sample = flip_m_random_coins_n_times(1000, 100, hidden=False)
estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue = mle(compute_sufficient_statistics(sample))

print(estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue)

0.404 0.2983910891089109 0.19552013422818793


Estimation with hidden labels using EM-algorithm

In [64]:
samples = flip_m_random_coins_n_times(1000, 100, hidden=True)
expectation_maximization(samples, 20,  0.5, 0.6, 0.1)

(0.4041371087959315, 0.2960913788601573, 0.19839880608971838)

Really similar results! 