- Student name: Duy Hieu Nguyen
- SID: 33694052

# 2. **Probability**

## Question 4: Bayes Rule
### Scenario: Picking Fruits from Colored Boxes
Three colored boxes: red, blue, and yellow. Each box contains a mix of apples and oranges:
- Red box: 3 apples, 5 oranges
- Blue box: 4 apples, 4 oranges
- Yellow box: 1 apple, 1 orange

We randomly select one box and then randomly pick a fruit from it. The question is: if we picked an apple, what is the probability it came from the yellow box?

### Task 1: 
Implement a Python function that simulates the above experiment (using a suitable method of a numpy random number generator obtained via numpy.random.get default rng). For instance you could name the function fruit experiment and it could take a parameter for the number of repeated simulations.

This function simulates the described experiment for the particular scenario.

In [4]:
import numpy as np

def fruit_experiment(num_simulations):
    # Initialize the experiment
    rng = np.random.default_rng()
    boxes = {
        "red": ["apple"] * 3 + ["orange"] * 5,
        "blue": ["apple"] * 4 + ["orange"] * 4,
        "yellow": ["apple"] * 1 + ["orange"] * 1,
    }

    # Arrays to store the results of each simulation
    selected_boxes = []
    selected_fruits = []

    for _ in range(num_simulations):
        # Randomly select a box
        box = rng.choice(list(boxes.keys()))

        # Randomly select a fruit from the chosen box
        fruit = rng.choice(boxes[box])

        selected_boxes.append(box)
        selected_fruits.append(fruit)

    return np.array(selected_boxes, dtype='object'), np.array(selected_fruits, dtype='object')

# Test the function
print(fruit_experiment(4))


(array(['red', 'red', 'yellow', 'blue'], dtype=object), array(['orange', 'orange', 'apple', 'orange'], dtype=object))


### Task 2: 
Answer the following question by a formal derivation in a markdown cell (ideally using Latex for clean typesetting): If the picked fruit is an apple, what is the probability that it was picked from the yellow box?


We want to find the probability that an apple was picked from the yellow box. Let's define the following events:
 
- $A$: Event that an apple is picked.
- $Y$: Event that a fruit is picked from the yellow box.
- $B$: Event that a fruit is picked from the blue box.
- $R$: Event that a fruit is picked from the red box.

We want to find the conditional probability $P(Y|A)$: Picked from yellow box, given an apple was picked.

Using Bayes' theorem:

\begin{equation*}
P(Y|A) = \frac{P(A|Y) \times P(Y)}{P(A)}
\end{equation*}

Where:

1. $P(Y)$: Probability of picking the yellow box = $\frac{1}{3}$ (since the 3 boxes are chosen uniformly at random).
2. $P(A|Y)$: Probability of picking an apple given that it's from the yellow box = $\frac{1}{2}$ (since there's 1 apple and 1 orange in the yellow box).
3. $P(A)$: Total probability of picking an apple.

Using Law of total probability:
\begin{align*}
P(A) &= P(A|R)P(R) + P(A|B)P(B) + P(A|Y)P(Y) \
= \frac{3}{8} \times \frac{1}{3} + \frac{4}{8} \times \frac{1}{3} + \frac{1}{2} \times \frac{1}{3} 
= \frac{11}{49}
\end{align*}

Substituting these values into Bayes' theorem:

\begin{equation*}
P(Y|A) = \frac{\frac{1}{2} \times \frac{1}{3}}{\frac{11}{24}} = \frac{1/6}{11/24} = \frac{4}{11}
\end{equation*}

Thus, the probability that an apple came from the yellow box given that an apple was picked is $\frac{4}{11}$.

To double check the solution, I perform a large number of simulations (100000). Then, counting the number of times we get an apple from the yellow box and divide it by the total number of times we get an apple. 

In [5]:
# Run the simulation for a large number of times
num_simulations = 100000
boxes, fruits = fruit_experiment(num_simulations)

# Count the number of times we get an apple from the yellow box
apple_from_yellow = np.sum((boxes == 'yellow') & (fruits == 'apple'))

# Count the total number of times we get an apple
total_apples = np.sum(fruits == 'apple')

# Calculate the probability
probability = apple_from_yellow / total_apples

print(f"Empirical Probability: {round(probability, 4)}")

Empirical Probability: 0.3675


### Question 5: Expected Values
#### Scenario: Rolling dice
One-player game: the player first rolls a fair six-sided die and then she determines her score as the sum of the outcomes of a number of a additional die roles, where the number of additionally rolled dice is equal to the number rolled with the first die.
- $X$: Outcome of the first die roll.
- $Y_i$ for $i = 1, ... 6$: Outcome of the $i-th$ subsequent die roll if $i <= X$, or 0 otherwise.
- $Z = Y_1 + Y_2 + ... + Y_6$: The final score of the player.

Question: What is the expected value of $E[Z]$?


### Task 1
Implement a Python function die experiment that simulates the above game for a desired number of repetitions and returns the array of scores achieved by the player for each repetition.

This function simulates the game described above

In [6]:
import numpy as np

def die_experiment(repetitions):
    rng = np.random.default_rng()  # Initialize a random number generator
    scores = []
    
    for _ in range(repetitions):
        X = rng.integers(1, 7)  # Roll the first die
        Y = [rng.integers(1, 7) if i < X else 0 for i in range(6)]  # Roll subsequent dice based on X
        Z = sum(Y)  # Calculate the score
        scores.append(Z)
        
    return np.array(scores)


### Task 2:
Estimation of Expected Player Score over 10000 times

In [7]:
repetitions = 10000
scores = die_experiment(repetitions)
expected_score = scores.mean()

# Calculate the 95% confidence interval for the expected score
confidence_level = 0.95
z = 1.96  # z-score for 95% confidence
std_dev = scores.std() # std deviation
std_error = std_dev / np.sqrt(repetitions) # std error
margin_of_error = z * std_error
confidence_interval = (expected_score - margin_of_error, expected_score + margin_of_error)

print(f"Estimated Expected Score: {expected_score:.2f}")
print(f"95% Confidence Interval: ({confidence_interval[0]:.2f}, {confidence_interval[1]:.2f})")


Estimated Expected Score: 12.25
95% Confidence Interval: (12.12, 12.39)


### Task 3:
Analytically derive the expected value E[Z]

Recall
- $X$: Outcome of the first die roll.
- $Y_i$ for $i = 1, ... 6$: Outcome of the $i-th$ subsequent die roll if $i <= X$, or 0 otherwise.
- $Z = Y_1 + Y_2 + ... + Y_6$: The final score of the player.

Given that the first die roll is $x$, the player will roll $x$ more dice. Each of these dice has an expected value:

\begin{align*}
E[\text{1 dice roll}] &= \frac{1 + 2 + 3 + 4 + 5 + 6}{6} \
= \frac{21}{6} \
= 3.5
\end{align*}
So, the expected value of the sum of $x$ dice rolls is x times the expected value of one dice roll:
\begin{align*}
E[Z|X=x] &= x \times E[\text{1 dice roll}] \
= x \times 3.5 \
= 3.5x
\end{align*}

Now, using the rule of total expectation:

\begin{align*}
E[Z] &= \sum_{x=1}^{6} E[Z|X=x] \times P(X=x) \
= \sum_{x=1}^{6} 3.5x \times \frac{1}{6} \
= \frac{1}{6} \times (3.5 + 7 + 10.5 + 14 + 17.5 + 21) \
= \frac{73.5}{6} \
= 12.25
\end{align*}

Thus, analytically, the expected score $E[Z] = 12.25$.
