## Case study 1

_Finding the winning strategy in a card game_

Flip one card at a time. If the last card is red, player wins. Player can halt the game at any time, and the next card will count.

#### Overview
To address the problem at hand, we will need to know how to
1. Compute the probabilities of observable events using sample space analysis
2. Plot the probablilities of events across a range of interval values
3. Simulate random processes, such as coin flips and card shuffling, using python
4. Evaluate our confidence in decisions drawn from simulations using confidence interval analyis

### Computing probablities using python

- basics of probability theory
- Computing probalities of a single observation
- Computing probabilities across a range of observations



__Sample space__

Set of possible measurable outcomes of an action. In python, denoted witth `{curly brackets}`.
- Sets are unordered, so you cannot be sure in which order the items will appear.
- Set items are unchangeable, but you can remove items and add new items.

In [1]:
#  Create a sample space of coin flips
sample_space = {'Heads', 'Tails'}
print(f'Sample space: {sample_space}')

# Computing the probability of heads:
probability_heads = 1 / len(sample_space)
print(f'Probability of choosing heads is {probability_heads}')

Sample space: {'Tails', 'Heads'}
Probability of choosing heads is 0.5


__Events__

To find more rigorous answers, we need to define the concept of an *event*. An event is a subset of those elements within `sample_space` that satisfy an *event condition* (heads OR tails, heads, tails, neither)

In [2]:
# Defining event conditions
def is_heads_or_tails(outcome):
    return outcome in {'Heads', 'Tails'}

def is_neither(outcome):
    return not is_heads_or_tails(outcome)

def is_heads(outcome):
    return outcome =='Heads'

def is_tails(outcome):
    return outcome == 'Tails'

We can now pass event conditions into a generalized `get_matching_event` function. The function iterates through the generic sample space and returns the set of outcomes where `event_condition(outcome)` is `True`

In [3]:
def get_matching_event(event_condition, sample_space):
    return set([outcome for outcome in sample_space if event_condition(outcome)])

In [4]:
# DEtecting events using event conditions
event_conditions = [is_heads_or_tails, is_heads, is_tails, is_neither]

for event_condition in event_conditions:
    print(f"Event Condition: {event_condition.__name__}")
    event = get_matching_event(event_condition, sample_space)
    print(f'Event: {event}\n')

Event Condition: is_heads_or_tails
Event: {'Tails', 'Heads'}

Event Condition: is_heads
Event: {'Heads'}

Event Condition: is_tails
Event: {'Tails'}

Event Condition: is_neither
Event: set()



We've successfully extracted 4 events from sample_space `{'Heads', 'Tails}`. We know that the probability of a single-element outcome is 1/len(sample_space). This can be generalized to include multi-element events. The probablity of an event is `len(event)/len(sample_space)`, but only if all outcomes are known to occur with equal likelihood. I.e. the probability of a multi-element event for a fair coin is equal to the event size divided by the sample space size. We now use event size to compute the four event probabilities:

In [6]:
# Computing event probabilities
def compute_probability(event_condition, generic_sample_space):
    # The compute_probability function extracts the event associated with
    # an inputted event condition to compute its probability
    event = get_matching_event(event_condition, generic_sample_space)
    # Probability is equal to event size divided by sample space size
    # print(f"{event}: {len(event)} / {generic_sample_space}: {len(generic_sample_space)}")
    return len(event) / len(generic_sample_space)

for event_condition in event_conditions:
    prob = compute_probability(event_condition, sample_space)
    name = event_condition.__name__
    print(f"Probability of event arising from '{name}' is {prob}")

{'Tails', 'Heads'}: 2 / {'Tails', 'Heads'}: 2
Probability of event arising from 'is_heads_or_tails' is 1.0
{'Heads'}: 1 / {'Tails', 'Heads'}: 2
Probability of event arising from 'is_heads' is 0.5
{'Tails'}: 1 / {'Tails', 'Heads'}: 2
Probability of event arising from 'is_tails' is 0.5
set(): 0 / {'Tails', 'Heads'}: 2
Probability of event arising from 'is_neither' is 0.0


### Analyzing a biased coin

How do we compute the likelihoods of outcomes that are not weighted in an equal manner? We construct a *weighted sample space* represented by a python dictionary. Each outcome is treated a key whos value maps to the associated weight. This allows us to redefine the sample space as the sum of all dictionary weights. Within `weighted_sample_space`, that sum will equal 5.

In [7]:
weighted_sample_space = {'Heads' : 4, 'Tails' : 1}

In [8]:
# Checking the weighted sample space size
sample_space_size = sum(weighted_sample_space.values())
assert sample_space_size == 5

In [9]:
# Checking the weighted event size
event = get_matching_event(is_heads_or_tails, weighted_sample_space)
event_size = sum(weighted_sample_space[outcome] for outcome in event)
assert event_size == 5

Our generalized definitions of sample space size and event size, permit us to create a `compute_event_probability` function. The function takes as input a `generic_sample_space` variable that can be either a weighted dictionary, or an unweighted set.

In [10]:
# Defining a generalized event probability funnction
def compute_event_probability(event_condition, generic_sample_space):
    event = get_matching_event(event_condition, generic_sample_space)
    if type(generic_sample_space) == type(set()):
        return len(event)/len(generic_sample_space)
    
    event_size = sum(generic_sample_space[outcome] for outcome in event)
    return event_size / sum(generic_sample_space.values())

We can now output all the event probabilities for the biased coin without needing to redifine out four event condition functions.

In [11]:
# Computing weighted event probabilities
for event_condition in event_conditions:
    prob = compute_event_probability(event_condition, weighted_sample_space)
    name = event_condition.__name__
    print(f"Probability of an event arising from '{name}' is {prob}")

Probability of an event arising from 'is_heads_or_tails' is 1.0
Probability of an event arising from 'is_heads' is 0.8
Probability of an event arising from 'is_tails' is 0.2
Probability of an event arising from 'is_neither' is 0.0


### Computing nontrivial probabilities
We will now solve several problems using `compute_event_probability`

#### Problem 1: Analyzing a family with four children
We assume each child is equally likely to be either a boy or a girl. Thus we can construct an unweighted sample space where each outcome represents one possible sequence of four children.

In [12]:
# Computing the sample space of children
possible_children = ['Boy', 'Girl']
sample_space = set()
for child1 in possible_children:
    for child2 in possible_children:
        for child3 in possible_children:
            for child4 in possible_children:
                outcome = (child1, child2, child3, child4)
                sample_space.add(outcome)

In [13]:
sample_space

{('Boy', 'Boy', 'Boy', 'Boy'),
 ('Boy', 'Boy', 'Boy', 'Girl'),
 ('Boy', 'Boy', 'Girl', 'Boy'),
 ('Boy', 'Boy', 'Girl', 'Girl'),
 ('Boy', 'Girl', 'Boy', 'Boy'),
 ('Boy', 'Girl', 'Boy', 'Girl'),
 ('Boy', 'Girl', 'Girl', 'Boy'),
 ('Boy', 'Girl', 'Girl', 'Girl'),
 ('Girl', 'Boy', 'Boy', 'Boy'),
 ('Girl', 'Boy', 'Boy', 'Girl'),
 ('Girl', 'Boy', 'Girl', 'Boy'),
 ('Girl', 'Boy', 'Girl', 'Girl'),
 ('Girl', 'Girl', 'Boy', 'Boy'),
 ('Girl', 'Girl', 'Boy', 'Girl'),
 ('Girl', 'Girl', 'Girl', 'Boy'),
 ('Girl', 'Girl', 'Girl', 'Girl')}

Nested loops are inefficient. Using itertools.product, we can easily generate our sample space. The `product` function returns _all pairwise combinations of all elements across all input lists_. We input four instances of the `possible_children` list into `itertools.product`. The product function then iterates over all four instances of the list, computing all the combinations of list elements. The final output equals out sample space.

The * operator unpacks multiple arguments sored within a list. These arguments are them passed into a specified function. Thus, calling `product(*(4 * [possible_children]))` is equivalent to calling `product(possible_children, possible_children, possible_children, possible_children)`

In [14]:
from itertools import product
all_combinations = product(*(4 * [possible_children]))
assert set(all_combinations) == sample_space

In [18]:
print(set(all_combinations))

set()


Note that after running this code, `all_combinations` will be empty. That is because the product returns a Python iterator, wich can be iterated over only once. For us, this isn't an issue. We are about to compute the sample space even more efficiently, and `all_combinations` will not be used in future code.

We make it more efficient by executing `set(possible_children, repeat=4)`. In general, running `product(possible_children, repeat=n)` returns an iterable over all posiible combinations of n children.

In [19]:
sample_space_efficient = set(product(possible_children, repeat=4))
assert sample_space == sample_space_efficient

In [21]:
sample_space_efficient

{('Boy', 'Boy', 'Boy', 'Boy'),
 ('Boy', 'Boy', 'Boy', 'Girl'),
 ('Boy', 'Boy', 'Girl', 'Boy'),
 ('Boy', 'Boy', 'Girl', 'Girl'),
 ('Boy', 'Girl', 'Boy', 'Boy'),
 ('Boy', 'Girl', 'Boy', 'Girl'),
 ('Boy', 'Girl', 'Girl', 'Boy'),
 ('Boy', 'Girl', 'Girl', 'Girl'),
 ('Girl', 'Boy', 'Boy', 'Boy'),
 ('Girl', 'Boy', 'Boy', 'Girl'),
 ('Girl', 'Boy', 'Girl', 'Boy'),
 ('Girl', 'Boy', 'Girl', 'Girl'),
 ('Girl', 'Girl', 'Boy', 'Boy'),
 ('Girl', 'Girl', 'Boy', 'Girl'),
 ('Girl', 'Girl', 'Girl', 'Boy'),
 ('Girl', 'Girl', 'Girl', 'Girl')}

Calculate the fraction of `sample_space` that is composed of families with two boys. We define a `has_two_boys` event condition and then pass that condition into `compute_event_probability`

In [22]:
def has_two_boys(outcome):
    return len([child for child in outcome if child == 'Boy']) == 2

prob = compute_event_probability(has_two_boys, sample_space)
print(f"Probability of 2 boys is {prob}")

Probability of 2 boys is 0.375


The probability of exactly two boys being born in a family is 0.375. We expect 37.5% of families with four children to contain an equal number oof boys and girsl. Of course, the actual observed percentage of families with two boys will vary due to random chance.

### Problem 2: Analyzing multiple die rolls
Suppose we're shown a fair six-sided die whose faces are numbered from 1 to 6. The die is rolled six times. What is the probability that these six die rolls add up to 21?
We begin by defining the possible values of any single roll.

In [15]:
# Defining all possible rolls of a six-sided die
possible_rolls = list(range(1,7))
print(possible_rolls)

[1, 2, 3, 4, 5, 6]


Next, we create the sample space for six consecutive die rollsd

In [16]:
# Sample spave for 6 consecutive die rolls
sample_space = set(product(possible_rolls, repeat=6))

Finally, we define a `has_sum_of_21` event condition that we'll subsequently pass into `compute_event_probability`

In [17]:
# Computing the probability of a die roll sum.
def has_sum_of_21(outcome):
    return sum(outcome) == 21

prob = compute_event_probability(has_sum_of_21, sample_space)
print(f"6 rolls sum to 21 with a probability of {prob}")

6 rolls sum to 21 with a probability of 0.09284979423868313


The die rolls will sum to 21 _more than_ 9% of the time. Our analysis can be coded more concisely using a _lambda_ expression. Lamda expressions are one-line anonyous functions that do not require a name. In this book, lambda expressions are used to pass short functions into other functions

In [18]:
# Computing the probability using a lambda expression
prob = compute_event_probability(lambda x: sum(x) == 21, sample_space)
assert prob == compute_event_probability(has_sum_of_21, sample_space)

### Problem 3: Computing die-roll probablilities using weighted sample spaces

Let's compute the probability using a weighted sample space. We need to convert the unweighted sample space set into a weightet sample space dictionary; this will require us to identify all possible die-roll combinations. These combinations are already stored in our computed `sample_space` set. By mapping the die-roll sums to their occurence counts, we will produce a `weighted_sample_space` result.

In [19]:
# Mapping die-roll sums to occurence counts
from collections import defaultdict
weighted_sample_space = defaultdict(int) # 
for outcome in sample_space:
    total = sum(outcome)
    weighted_sample_space[total] += 1

Some properties of `weighted_sample_space`:
 - Not all weights are equal
 - e.g. only one way to roll 6 and 36

In [20]:
assert weighted_sample_space[6] == 1
assert weighted_sample_space[36] == 1 

In [21]:
import itertools
dict(itertools.islice(weighted_sample_space.items(), 10))

{21: 4332,
 22: 4221,
 27: 1666,
 12: 456,
 24: 3431,
 19: 3906,
 18: 3431,
 20: 4221,
 23: 3906,
 26: 2247}

The length of the `weighted_sample_space` is all the possible sums of a six-die roll. Since 6 is the lowest possible result, its 36 (`6*6`) - 5 (`[1, 2, 3, 4, 5]`)

In [22]:
len(weighted_sample_space)

31

In [23]:
# Checking a more common die-roll combination
num_combinations = weighted_sample_space[21]
print(f"There are {num_combinations} ways for 6 die rolls to sum to 21")

There are 4332 ways for 6 die rolls to sum to 21


The output shows there are 4332 ways for six die rolls to sum up to 21. A sum of 21 is a lot more possible than a sum of 6.

In [24]:
# Exploring different ways of summing to 21
assert sum([4, 4, 4, 4, 3, 2]) == 21
assert sum([4, 4, 4, 5, 3, 1]) == 21

Note that the observed count of 4332 is equal to the length of an unweighted event whose die rolls add up to 21. Also, the sum of values in `weighted_sample` is equal to the length of `sample_space`. Hence, a direct link exists between the unweighted and weighted event probability computation. 

In [25]:
# Comparing weighted events and regular events
event = get_matching_event(lambda x: sum(x) == 21, sample_space)
assert weighted_sample_space[21] == len(event)
assert sum(weighted_sample_space.values()) == len(sample_space)

Let's now recompute the probability using the `weighted_sample_space` dictionary. The final probability of rolling 21 should remain unchanged.