# Exercises on Probability

## The Binomial Law (or Distribution)

The binomial distribution is a probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials with the same probability of success. A Bernoulli trial is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is conducted.

The probability of getting exactly `k` successes (which can be hits, correct guesses etc.) in `n` trials (repeated attempts) is given by the formula:

# $$ P(X=k) = C(n, k) \cdot p^k \cdot (1-p)^{n-k} $$

where:

- `P(X=k)` is the probability of `k` successes in `n` trials,
- `C(n, k)` is the number of combinations of `n` items taken `k` at a time,
- `p` is the probability of success in a given trial, and
- `(1-p)` is the probability of failure (which is 1 minus the probability of success).

`C(n, k)` can be calculated as `n! / [k!(n-k)!]` where `!` denotes factorial.

The binomial distribution is used when there are exactly two mutually exclusive outcomes of a trial, often referred to as "success" and "failure". The parameters of a binomial distribution are `n` and `p` where `n` is the total number of trials, and `p` is the probability of success in a given trial. The distribution is suitable for the models where the outcome of one trial does not affect the outcome of another trial.

# Exercise 1
## Dice Roll Simulation

In this exercise, you will simulate the roll of two dice and calculate the probability of the sum of the faces being a certain number.

### Task

1. Simulate the roll of two dice a large number of times (e.g., 10,000 times).
2. For each roll, calculate the sum of the two faces.
3. Calculate the probability of the sum being 7 and the probability of the sum being 11.

Here's a skeleton of how you might structure your code:


In [None]:
import random

def simulate_dice_roll(num_trials):
    # Initialize counts
    count_7 = 0
    count_11 = 0

    # Simulate num_trials iterations of dice roll
    for _ in range(num_trials):
        # Roll two dice
        dice1 = ...
        dice2 = ...

        # Calculate the sum
        dice_sum = ...

        # If the sum is 7, increment count_7
        if dice_sum == 7:
            count_7 += 1

        # If the sum is 11, increment count_11
        if dice_sum == 11:
            count_11 += 1

    # Calculate probabilities
    prob_7 = count_7 / num_trials
    prob_11 = count_11 / num_trials

    return prob_7, prob_11

# Run simulation
num_trials = 10000
prob_7, prob_11 = simulate_dice_roll(num_trials)

print(f"Probability of sum being 7: {prob_7}")
print(f"Probability of sum being 11: {prob_11}")

## The full code

In [33]:
import random

def simulate_dice_roll(num_trials):
    # Initialize counts
    count_7 = 0
    count_11 = 0

    # Simulate num_trials iterations of dice roll
    for _ in range(num_trials):
        # Roll two dice
        dice1 = random.randint(1, 6)
        dice2 = random.randint(1, 6)

        # Calculate the sum
        dice_sum = dice1 + dice2

        # If the sum is 7, increment count_7
        if dice_sum == 7:
            count_7 += 1

        # If the sum is 11, increment count_11
        if dice_sum == 11:
            count_11 += 1

    # Calculate probabilities
    prob_7 = count_7 / num_trials
    prob_11 = count_11 / num_trials

    return prob_7, prob_11

# Run simulation
num_trials = 10000
prob_7, prob_11 = simulate_dice_roll(num_trials)

print(f"Probability of sum being 7: {prob_7}")
print(f"Probability of sum being 11: {prob_11}")

Probability of sum being 7: 0.1684
Probability of sum being 11: 0.0571


# Exercise 2:

A multiple-choice exam has 10 questions. Each question has four possible answers, and only one of them is correct. A student hasn't studied for the exam at all and decides to randomly guess the answers.

1. What is the probability that the student will answer exactly 5 questions correctly?
2. What is the probability that the student will answer at least 1 question correctly?

Use the binomial distribution to solve this problem. Assume that the student's guesses are independent, with the probability of guessing a question correctly being 0.25 for each question.

Let's solve the exercise using the binomial distribution.

1. What is the probability that the student will answer exactly 5 questions correctly?

In this case, `n` (the number of trials) is 10, `k` (the number of successes) is 5, and `p` (the probability of success on each trial) is 0.25. We can use the formula for the binomial distribution:



$$ P(X=k) = C(n, k) \cdot p^k \cdot (1-p)^{n-k} $$



So, the probability of getting exactly 5 correct answers is:



$$ P(X=5) = C(10, 5) \cdot (0.25^5) \cdot ((1-0.25)^{10-5}) $$



Here, `n` is 10 (the number of questions), `k` is 5 (the number of correct answers we're interested in), and `p` is 0.25 (the probability of guessing a question correctly).

First, calculate `C(n, k)`, the number of combinations of 10 items taken 5 at a time. This is given by the formula `n! / [k!(n-k)!]`. So `C(10, 5) = 10! / [5!(10-5)!] = 252`.

Next, calculate `p^k`, which is `(0.25)^5 = 0.0009765625`.

Then, calculate `(1-p)^(n-k)`, which is `(1 - 0.25)^(10 - 5) = 0.75^5 = 0.2373046875`.

Finally, multiply these three values together to get the probability:



$$ P(X=5) = 252 * 0.0009765625 * 0.2373046875 = 0.058399200439453125 $$

So, the probability that the student will answer exactly 5 questions correctly is approximately 0.058 or 5.8%.



2. What is the probability that the student will answer at least 1 question correctly?

This is the complement of the event that the student answers all questions incorrectly. So, we can find the probability of getting all answers wrong and subtract it from 1.

The probability of getting all answers wrong is:



$$ P(X=0) = C(10, 0) \cdot (0.25^0) \cdot ((1-0.25)^{10-0}) $$

Which is `P(X=0) = C(10, 0) * (0.25^0) * ((1 - 0.25)^10) = 1 * 1 * (0.75^10) = 0.056313514709472656`.



So, the probability of getting at least one correct answer is:



$$ P(X>=1) = 1 - P(X=0) $$

Which is `P(X>=1) = 1 - P(X=0) = 1 - 0.056313514709472656 = 0.9436864852905273`.

So, the probability that the student will answer at least 1 question correctly is approximately 0.944 or 94.4%.

# Exercise 3:
## Monty Hall Problem

The Monty Hall problem is a probability puzzle based on a game show where a contestant is asked to choose one of three doors. Behind one door is a car, and behind the other two doors are goats. After the contestant chooses a door, the host, who knows what's behind each door, opens one of the other two doors to reveal a goat. The contestant is then given the option to switch their choice to the remaining unopened door or stick with their initial choice. The question is: Should they stick to their choice? Should they switch? Does it matter?

The Monty Hall problem is like a statistical illusion. This statistical illusion occurs because our brain’s process for evaluating probabilities in the Monty Hall problem can be based on a false assumption. The majority of people assume that both doors are equally like to have the prize. It appears like the door you chose has a 50/50 chance. Because there is no perceived reason to change, most stick with their initial choice.

It turns out that there are only nine different combinations of choices and outcomes (for a three-doors problem). Therefore, we can just show them all and calculate the percentage for each outcome:

| You Pick | Prize Door | Don't Switch | Switch |
| :---------------: | :---------------: | :---------------: | :---------------: |
| 1 | 1 | Win | Lose |
| 1 | 2 | Lose  | Win |
| 1 | 3 | Lose  | Win |
| 2 | 1 | Lose | Win |
| 2 | 2 | Win | Lose |
| 2 | 3 | Lose | Win |
| 3 | 1 | Lose | Win |
| 3 | 2 | Lose | Win |
| 3 | 3 | Win | Lose |
| | | **3 Wins (33%)** | **6 Wins (66%)** |

And there we have it. If we switch doors, we double our probability of winning! 🤯

To understand the solution, we need to understand why we tend to choose the incorrect solution that it's 50/50. That happens because we're using incorrect assumptions. We usually think of probabilities for independent, random events. However, for that to be true, the process we're studying must be **random** and have **probabilities that do not change**. Unfortunately, the Monty Hall problem does not satisfy either requirement — the host _doesn't_ choose randomly and, when he does, the probability _does_ change.

### Task

Simulate the Monty Hall problem in Python and calculate the probabilities of winning if the contestant sticks with their initial choice and if they switch their choice after the host reveals a goat.

Here's a skeleton of how you might structure your code:

In [None]:
import random

def simulate_monty_hall(num_trials=1000, switch_choice=True):
    # Initialize win count
    win_count = 0

    # Simulate num_trials iterations of the game
    for _ in range(num_trials):
        # Randomly place the car behind one of the doors
        car_location = ...

        # Contestant makes an initial choice
        contestant_choice = ...

        # Host reveals a goat behind one of the other doors
        revealed_door = ...

        # If switch_choice is True, contestant switches their choice
        if switch_choice:
            contestant_choice = ...

        # If contestant's choice is the car location, increment win count
        if contestant_choice == car_location:
            win_count += 1

    # Calculate win probability
    win_probability = win_count / num_trials

    return win_probability

# Run simulations
num_trials = 10000
stay_probability = simulate_monty_hall(num_trials, switch_choice=False)
switch_probability = simulate_monty_hall(num_trials, switch_choice=True)

print(f"Probability of winning if stay with initial choice: {stay_probability}")
print(f"Probability of winning if switch choice: {switch_probability}")

## The full code

In [34]:
import random

def simulate_monty_hall(num_trials, switch_choice):
    # Initialize win count
    win_count = 0

    # Simulate num_trials iterations of the game
    for _ in range(num_trials):
        # Randomly place the car behind one of the doors
        car_location = random.randint(1, 3)

        # Contestant makes an initial choice
        contestant_choice = random.randint(1, 3)

        # Host reveals a goat behind one of the other doors
        while True:
            revealed_door = random.randint(1, 3)
            if revealed_door != car_location and revealed_door != contestant_choice:
                break

        # If switch_choice is True, contestant switches their choice
        if switch_choice:
            doors = [1, 2, 3]
            doors.remove(contestant_choice)
            doors.remove(revealed_door)
            contestant_choice = doors[0]

        # If contestant's choice is the car location, increment win count
        if contestant_choice == car_location:
            win_count += 1

    # Calculate win probability
    win_probability = win_count / num_trials

    return win_probability

# Run simulations
num_trials = 10000
stay_probability = simulate_monty_hall(num_trials, switch_choice=False)
switch_probability = simulate_monty_hall(num_trials, switch_choice=True)

print(f"Probability of winning if stay with initial choice: {stay_probability}")
print(f"Probability of winning if switch choice: {switch_probability}")

Probability of winning if stay with initial choice: 0.3343
Probability of winning if switch choice: 0.6602


## Breaking down the code

1. **Importing the random module**: This module is used to generate random numbers.



In [None]:
import random



2. **Defining the function**: The function `simulate_monty_hall` is defined with two parameters: `num_trials` (the number of times the game is played) and `switch_choice` (a boolean indicating whether the contestant switches their choice after the host reveals a door).



In [None]:
def simulate_monty_hall(num_trials, switch_choice):



3. **Initializing the win count**: A variable `win_count` is initialized to 0. This will keep track of the number of times the contestant wins.



In [None]:
win_count = 0



4. **Simulating the game**: The game is simulated `num_trials` times using a for loop.



In [None]:
for _ in range(num_trials):



5. **Placing the car and making the initial choice**: The car is randomly placed behind one of the three doors, and the contestant also randomly chooses one of the three doors.



In [None]:
car_location = random.randint(1, 3)
contestant_choice = random.randint(1, 3)



6. **Revealing a door**: The host reveals a door that is not the car location and not the contestant's initial choice. It does it randomly until the sorted door is neither the car location nor the contestant's initial choice — as it should be in the game —, when it breaks the loop.



In [None]:
while True:
    revealed_door = random.randint(1, 3)
    if revealed_door != car_location and revealed_door != contestant_choice:
        break



7. **Switching choice**: If `switch_choice` is True, the contestant switches their choice to the other unopened door.



In [None]:
if switch_choice:
    doors = [1, 2, 3]
    doors.remove(contestant_choice)
    doors.remove(revealed_door)
    contestant_choice = doors[0]



8. **Checking for a win**: If the contestant's choice is the car location, `win_count` is incremented by 1.



In [None]:
if contestant_choice == car_location:
    win_count += 1



9. **Calculating win probability**: After all trials are completed, the win probability is calculated as the number of wins divided by the number of trials.



In [None]:
win_probability = win_count / num_trials



10. **Running simulations**: The function is called twice, once with `switch_choice` set to False and once with it set to True. The results are printed out.



In [None]:
num_trials = 10000
stay_probability = simulate_monty_hall(num_trials, switch_choice=False)
switch_probability = simulate_monty_hall(num_trials, switch_choice=True)

print(f"Probability of winning if stay with initial choice: {stay_probability}")
print(f"Probability of winning if switch choice: {switch_probability}")

## More doors
Here we can alter the number of doors to see how the probabilities of winning change.
The host will still open only one door after the first choice is made. And that door will still reveal a goat, like we discussed. As expected, the chance of winning still increases when the contestant chooses to switch. However, the more doors, the smaller the difference between the two probabilities will be.

In [36]:
import random

def simulate_monty_hall(num_trials, num_doors, switch_choice):
    win_count = 0

    for _ in range(num_trials):
        car_location = random.randint(1, num_doors)
        contestant_choice = random.randint(1, num_doors)

        while True:
            revealed_door = random.randint(1, num_doors)
            if revealed_door != car_location and revealed_door != contestant_choice:
                break

        if switch_choice:
            doors = list(range(1, num_doors + 1)) # +1 because range() is exclusive of the stop value
            doors.remove(contestant_choice)
            doors.remove(revealed_door)
            contestant_choice = random.choice(doors)

        if contestant_choice == car_location:
            win_count += 1

    win_probability = win_count / num_trials

    return win_probability

num_trials = 10000
num_doors = 5
stay_probability = simulate_monty_hall(num_trials, num_doors, switch_choice=False)
switch_probability = simulate_monty_hall(num_trials, num_doors, switch_choice=True)

print(f"Probability of winning if stay with initial choice: {stay_probability}")
print(f"Probability of winning if switch choice: {switch_probability}")

Probability of winning if stay with initial choice: 0.2067
Probability of winning if switch choice: 0.2701
