# Collecting Coupons

- Coupons in cereal boxes are numbered 1 to 5
    - All 5 coupons must be collected to receive a prize
- *With one coupon per box, how many coupons on average are required to make a complete set?*

_____

- First, we'll think of the probability that we get all 5 coupons in our first 5 boxes
    - There are $5!$ different permutations of the ordering of coupons, and $5^{5}$ different scenarios overall for the 5 boxes
    
$$
P(\text{5 distinct coupons in 5 boxes}) = \frac{5!}{5^{5}} = \frac{4!}{5^{4}}
$$

- Next, we consider the case where it takes us 6 boxes to get all 5 coupons
    - The easy part is knowing that there are $5^{6}$ different scenarios to get 6 coupons
- Now, counting the number of ways we can get 5 coupons in 6 boxes:
    - For the first coupon, it's impossible for it already to be a duplicate, so there are 5 possible "first coupons"
    - Next, **if the second coupon is a duplicate**, there's only one possible coupon it can be (the first one)
        - So, the number of ways this can happen is $5\cdot1\cdot4\cdot3\cdot2\cdot1$
    - Assuming the **second coupon is NOT a duplicate**, then there's 4 possible coupons to get in the second box
    - If the **third coupon is a duplicate**, there are 2 options now (the first or the second coupon)
        - Therefore, the number of ways this can happen is $5\cdot4\cdot2\cdot3\cdot2\cdot1$
    - If the **fourth coupon is a duplicate**, there are now 3 options so the number of ways it can happen is $5\cdot4\cdot3\cdot3\cdot2\cdot1$
    - The final scenario is that the **fifth is a duplicate** which has $5\cdot4\cdot3\cdot2\cdot4\cdot1$ ways of happening
        - *Why can't the 6th be the duplicate?*
            - Because we wouldn't open a 6th box if we hadn't had a duplicate yet (since we'd already have all 5)

- Therefore the probability is:

$$
P(\text{5 distinct coupons in 6 boxes}) = \frac{(1 + 2 + 3 + 4)\cdot5\cdot4\cdot3\cdot2\cdot1}{5^{6}} = \frac{10 \cdot 4!}{5^{5}} = 2 \frac{4!}{5^{4}}
$$

- Now, let's consider the case of getting 5 distinct in 7 boxes
    - First can't be a duplicate
    - If the **second is a duplicate**, then we need to consider where the other duplicate goes
        - If it's the third, it can only be one card (the first)
        - If it's the fourth, it can be one of two (the first or the third)
        - If it's fifth, if can be one of three (the first, the third, or the fouth)
        - If it's the sixth, if can be one of four (the first, the third, the fourth, or the fifth)
        - The subtotal number of ways this can happen is $(1+2+3+4)\cdot5\cdot1\cdot4\cdot3\cdot2\cdot1 = 10\cdot1\cdot5!$
    - If the second isn't a duplicate and the **third is**, it's the same calculation above except with some small adjustments
        - The subtotal number of ways this can happen is $(2+3+4)\cdot5\cdot4\cdot2\cdot3\cdot2\cdot1 = 9\cdot2\cdot5!$
    - If the **first duplicate is the fourth**, the subtotal number of ways this can happen is $(3+4)\cdot5\cdot4\cdot3\cdot3\cdot2\cdot1 = 7\cdot3\cdot5!$
    - If the **first duplicate is the fifth**, the subtotal number of ways this can happen is $(4)\cdot5\cdot4\cdot3\cdot2\cdot4\cdot1=4\cdot4\cdot5!$

- Therefore, the probability is equal to:

$$
P(\text{5 distinct coupons in 7 boxes}) = \frac{(10 + 18 + 15 + 16)\cdot5!}{5^{7}} = \frac{65\cdot4!}{5^{6}} = \frac{13}{5}\frac{4!}{5^{4}}
$$

____

- Let's simulate this

In [20]:
from math import factorial
import numpy as np

In [21]:
def sim():
    list_coupons = []
    count = 0
    while sorted(set(list_coupons))!=[1,2,3,4,5]:
        count += 1
        new_coupon = np.random.randint(1,6)
        list_coupons.append(new_coupon)
    return count

In [27]:
n_trials = 1000000
list_results = []

for trial in range(n_trials):
    list_results.append(sim())

In [35]:
list_results.count(7)/n_trials

0.099944

In [36]:
13*factorial(4)/5**5

0.09984