**Maximum likelihood estimatation from observed and unobserved data**

You are given a bag containing red and blue coins. All the red coins have the same probability of heads. All the blue coins have the same probability of heads (possibly different from that of the red coins).

Your task is to estimate the proportion of red coins in the bag and the probability of heads for both the red and the blue coin.

In [1]:
import ipywidgets as widgets
# prob_red = widgets.FloatSlider(min=0.0, max=1.0, description='prob_red')
# prob_head_red = widgets.FloatSlider(min=0.0, max=1.0, description='head_red')
# prob_head_blue = widgets.FloatSlider(min=0.0, max=1.0, description='head_blue')
# display(prob_red, prob_head_red, prob_head_blue)

In [2]:
prob_red, prob_head_red, prob_head_blue = 0.3, 0.6, 0.2

Use these widgets to control the model.

In [3]:
import random
def choose_coin():
    return 'R' if random.random() < prob_red else 'B'

def flip_coin(coin):
    uar = random.random()
    if coin == 'R':
        if uar < prob_head_red:
            return 'H'
    elif uar < prob_head_blue:
        return 'H'
    return 'T'

def flip_random_coin_n_times(n, hidden=False):
    coin = choose_coin()
    return ('_' if hidden else coin, ''.join([flip_coin(coin) for i in range(n)]))

def flip_m_random_coins_n_times(m, n, hidden=False):
    return [flip_random_coin_n_times(n, hidden) for i in range(m)]

Use the above methods to sample from the model. The optional parameter 'hidden' controls whether the colour of the coin is observed in the samples.

In [4]:
flip_m_random_coins_n_times(5, 100)

[('B',
  'TTTTTTTTTTHTHTTHTHTTTTTTTTTTHHTHTTTTTTTTHHTTTTTTTHTTTTTTHTHTTTHTTTHTTTTHTTHTTTTTTTTTTTTTHTTTTTHTTHHH'),
 ('B',
  'TTTTTTTTTTHHTTTTTTTTTTHTTTTTTTTTTTTTTTTTTTTTTTTTTTTTHTTHTTTTTTTTTHHHTTTTTTTHHHTTTHTTTTHTTHTTTTTHHTTT'),
 ('B',
  'TTTHHHTTHTTHTTTTTTTTTTTTHTHTTTTTTHTTTHHHTTTHHTTTTTTTTTTTTTTHTHHHTTTTTTTHTTHTTTTTTTHTTTTTTTTTTHTTTTTT'),
 ('R',
  'HTHHHTHTTTTHHTTHTHTHTHHTHHTTHHTHTHTHHHHTHHHHTTTHHTHHTHTHHHHTHTTHHHTTTTTTTTHHHTHHTHHTHTHTHTHTTTTHHHHH'),
 ('R',
  'TTTHTHTTHHTTTHTHHTHHTHHHHHHHTTHHTHHTHHHHTHHTHHHHHHHHTHTHHHHTHTHHHHTHHHHHHHTHTHHHHHTTHHTHHHTHHHTHHHHH')]

In [5]:
flip_m_random_coins_n_times(5, 100, hidden=True)

[('_',
  'HTTTTHTHTTTTTTHTTTTTTHTTTTTHHTTTHTTTHTTTTTTTHTTTTTTTTTTTTTHTTTTTTHHTTTTTTTTHTTHHTTTTTTTTTTTHTTTHHTTT'),
 ('_',
  'HHHTTTHTHHHHHHTHTHHTHHHHHHHHTTTTTHTTTHHTTTTTHTTTTHHTHHTHTHHHHHHHTHHHHTHTHHTHTTHTTTHTHHTTHHTTHHTTTHTH'),
 ('_',
  'TTTTTTHTTTTHTHHTHTTHTTTHTTTHTTHTTTHHTTTTTTTTTTTTHTTTHTTTTHTTHTTTTTTHTTHTHHTHTTTHTHTTHTHTTTTTTTTTTHTT'),
 ('_',
  'TTHTTTTHTHTTHTTTTTTHTHHHHTTTTTTTTTTTTTTTHHTTHHTTTTTTTTTTHTHTHTTTTTTTTTTTHTTHTTHTTTHHTTHHTHTTHHTTTTTT'),
 ('_',
  'THHHHHHTHTHHTTTTHHHHHHTHHHHTTTHTHTHTHTHHHHTHHHHHTHHTTHTHTHTHHTHHTHTTTTHHHHHTTHHTHHTTHHTHHTTHHTHTTTHH')]

**TASK 1** Implement the following two functions to estimate parameters for the model in the observed case. Splitting the work into two separate functions will simplify things for the next task. 

* How could you measure the error in your estimates?
* How does the error decrease with the sample size?
* If you were only allowed to flip coins a total of N times how would you choose m (the number of coins) and n the number of times to flip each coin? Why?

In [6]:
from collections import Counter, defaultdict

In [7]:
def compute_sufficient_statistics(samples):
    statistics = []
    for coin_color, coin_throws in samples:
        heads_tails_couter = Counter(coin_throws)
        statistics.append((coin_color, heads_tails_couter['H'], sum(heads_tails_couter.values())))
    return statistics
#     assert 'Implement me to compute the sufficient statistics for the model from the samples.'

def mle(sufficient_statistics):
    color_counter = Counter()
    heads_counter = defaultdict(Counter)
    
    for color, heads_count, total_count in sufficient_statistics:
        color_counter[color] += 1
        heads_counter[color]['H'] += heads_count
        heads_counter[color]['total'] += total_count
    
    prob_red = color_counter['R'] / sum(color_counter.values())
    
    probs = []
    for color in ['R', 'B']:
        try:
            heads_prob = heads_counter[color]['H'] / heads_counter[color]['total']
        except ZeroDivisionError:
            heads_prob = None
        probs.append(heads_prob)
        
    prob_head_red, prob_head_blue = probs
        
    return prob_red, prob_head_red, prob_head_blue
#     assert 'Implement me to compute mle parameter estimates given sufficient statistics.'

In [8]:
def compute_sufficient_statistics(samples):
    color_counter = Counter()
    heads_counter = defaultdict(Counter)
    
    for coin_color, coin_throws in samples:
        heads_counter[coin_color] += Counter(coin_throws)
        color_counter[coin_color] += len(coin_throws)
    
    total = sum(color_counter.values())
    red, red_head, blue_head = color_counter['R'], heads_counter['R']['H'], heads_counter['B']['H']
    return total, red, red_head, blue_head
#     assert 'Implement me to compute the sufficient statistics for the model from the samples.'

def mle(total, red, red_head, blue_head):
    prob_red = red / total
    prob_head_red = red_head / red
    prob_head_blue = blue_head / (total - red)
        
    return prob_red, prob_head_red, prob_head_blue
#     assert 'Implement me to compute mle parameter estimates given sufficient statistics.'

In [9]:
mle(*compute_sufficient_statistics(flip_m_random_coins_n_times(1000, 1000, hidden=False)))

# 1.06 in vid

(0.263, 0.6000760456273764, 0.20019267299864316)

**TASK 2** Given a sample from a single coin whose colour is unobserved, estimate the posterior probability that the coin is red, given some estimates of the model parameters.

* If you pass in the true model parameters (e.g. prob_red.value, prob_head_red.value and prob_head_blue.value), how quickly does the posterior change? Use the plot_distribution function to view this.
* How does this depend on the model parameters?

In [10]:
def compute_posterior_prob_red(sample, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue):
    count_head = sample.count('H')
    count_tail = len(sample) - count_head
    
    # probability to observe our data if color is blue or color is red: binomial
    joint_red = estimate_prob_red * estimate_prob_head_red**count_head * (1 - estimate_prob_head_red)**count_tail
    joint_blue = (1 - estimate_prob_red) * estimate_prob_head_blue**count_head * (1 - estimate_prob_head_blue)**count_tail
    return joint_red / (joint_red + joint_blue)


In [12]:
estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue = mle(*compute_sufficient_statistics(flip_m_random_coins_n_times(10, 100, hidden=False)))
estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue

(0.4, 0.585, 0.20333333333333334)

In [13]:
true_color, sample = flip_m_random_coins_n_times(1, 30, hidden=False)[0]
true_color

'B'

In [14]:
compute_posterior_prob_red(sample, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue)

6.026721083450672e-05

**TASK 3** Reusing your code from Tasks 1 and 2, implement expectation maximization algorithm to find a (locally optimal) solution to the parameters when the colour of the coins is not observed.

In [30]:
def mle(total, count_red, count_red_head, count_blue_head):
    estimate_prob_red = count_red / total
    estimate_prob_head_red = count_red_head / count_red
    estimate_prob_head_blue = count_blue_head / (total - count_red)
    return estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue

In [31]:
def compute_expected_statistics(samples, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue):
    # we want to compute probability of samples given our current estimates
    total, red, red_head, blue_head = 0, 0.0, 0.0, 0.0
    for color, throws in samples:
        posterior_prob_red = compute_posterior_prob_red(throws, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue)
        total += len(throws)
        red += posterior_prob_red * len(throws)
        red_head += posterior_prob_red * throws.count('H')
        blue_head += (1 - posterior_prob_red) * throws.count('H')
    
    return total, red, red_head, blue_head
#     assert 'Compute the sufficient statistics for this sample given these parameter estimates.'
    
def expectation_maximization(samples, iterations, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue):
    for n_inter in range(iterations):
        total, expected_red, expected_red_head, expected_blue_head = compute_expected_statistics(samples, estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue)
        estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue = mle(total, expected_red, expected_red_head, expected_blue_head)
    return estimate_prob_red, estimate_prob_head_red, estimate_prob_head_blue
        
#     assert 'Compute the mle parameter estimates for the model from a sample without labels. '

In [32]:
samples = flip_m_random_coins_n_times(1000, 1000, hidden=True)

In [36]:
expectation_maximization(samples, 100, 0.5, 0.5, 0.6)
# EM cannot know which color is which! -> depends on init

(0.7, 0.19970285714285715, 0.5999466666666666)

In [37]:
prob_red, prob_head_red, prob_head_blue

(0.3, 0.6, 0.2)