# Section 2
Name: Michelle Fong  


## Question 4

In [10]:
import numpy as np

def fruit_experiment(num_simulations):
    # input details
    boxes = ['red', 'blue', 'yellow']
    # P(choosing one of the box)
    p_box = [1/3, 1/3, 1/3]  

    fruits = {
        'red': ['apple', 'apple', 'apple', 'orange', 'orange', 'orange', 'orange', 'orange'],
        'blue': ['apple', 'apple', 'apple', 'apple', 'orange', 'orange', 'orange', 'orange'],
        'yellow': ['apple', 'orange']
    }

    # random part 
    rng = np.random.default_rng()

    # random indices for drawing box
    box_indices = rng.integers(0, len(boxes), size=num_simulations)
    
    # create an array with drawn boxes
    boxes = np.array(boxes, dtype='object')[box_indices]

    # indices of fruit drawn
    fruit_indices = [rng.integers(0, len(fruits[box])) for box in boxes]
    # an array: extract selected fruit by selected_box and the selected_fruit_indices
    fruits = np.array([fruits[box][idx] for box, idx in zip(boxes, fruit_indices)], dtype='object')

    return boxes, fruits

num_simulations = 4
boxes, fruits = fruit_experiment(num_simulations)
output = [boxes, fruits]
output


[array(['blue', 'yellow', 'red', 'red'], dtype=object),
 array(['orange', 'apple', 'orange', 'apple'], dtype=object)]

<div style="background-color: lightgrey; padding: 10px;">
Question:  If the picked fruit is an apple, what is the probability that it was picked
from the yellow box?
</div> 

Representation:  
$\{r, b, y\} \in B$  
$\{a, o\} \in F$  

Notation:  
- $B$: a Random Variable representing the identity of the selected box  
- $r$, $b$, $y$: correpsonds to red, blue, yellow box

- $F$: a Random Variable representing the identity of the selected fruit  
- $a$, $o$: corresponds to apple and orange

Given $P(F=a)$, what is $P(y|a)$?  

$$\begin{align}
P(y|a) &= \frac{P(y \cap a)}{P(a)}\\
& = \frac{P(a|y)P(y)}{P(a|r)P(r) + P(a|b)P(b) + P(a|y)P(y)}\\
& = \frac{1/2*1/3}{3/8*1/3 + 1/2*1/3 + 1/2*1/3}\\
& = \frac{4}{11}\\
& \approx 0.3636
\end{align}$$


In [12]:
num_simulations = 100000
boxes, fruits = fruit_experiment(num_simulations)

# p(y & a)
apple_yellow = np.sum((fruits == 'apple') & (boxes == 'yellow'))
# p(a)
apple = np.sum(fruits == 'apple')

simulated_prob = apple_yellow / apple

print("Simulated probability:", round(simulated_prob,4))

Simulated probability: 0.3668


The simulated result is very close to the maths derivation above so double check the probability.

## Question 5

#### I

In [14]:
def die_experiment(num_simulations):
    scores = []

    for _ in range(num_simulations):
        # first die roll
        X = np.random.randint(1, 7)  

        # subsequent die roll size of X
        Yi = np.random.randint(1, 7, size=X)
        score = np.sum(Yi)

        scores.append(score)

    return np.array(scores)

results = die_experiment(100)
results

array([17, 18, 21, 15, 12,  7,  5, 20, 14, 20, 26,  1,  8, 17,  7,  2, 14,
       15, 19, 20, 14,  7, 18,  8,  8, 17,  6, 16, 10, 24, 15, 14, 12,  4,
       20, 13, 12,  4, 15,  8, 12,  2, 23, 10,  5, 13, 16,  5, 18,  3, 18,
       10,  9,  3, 22, 19,  5, 10, 20, 14, 17,  5, 10,  9, 16,  5,  1,  1,
       15,  3,  8, 12, 10,  3, 12,  6, 14,  1, 17, 19,  9, 10, 10, 14, 10,
       12, 12,  6, 13, 12, 15,  6, 16,  1,  8, 15, 10, 14,  4, 16])

#### II
With 10000 repetition, it is assume the sample of outcome will be approximately normal under the law of large number. Under normal distribution, the alphs/2 = 2.25% corresponds to 1.96 standard deviation.

In [27]:
import numpy as np
from scipy import stats
num_simulations = 10000
scores = die_experiment(num_simulations)

# expected final score
expected_z = np.mean(scores)

# sample sd of error of expected_z = sd/sqrt(n)
se = np.std(scores) / np.sqrt(num_simulations)

# 95% CI
alpha = 0.95  
ci = [expected_z - 1.96*se,
      expected_z + 1.96*se]

print("Estimated expected final score:", expected_z)
print(f"95% Confidence interval: {ci[0]:.4f} to {ci[1]:.4f}")

Estimated expected final score: 12.2198
95% Confidence interval: 12.0875 to 12.3521


<div style="background-color: lightgrey; padding: 10px;">
Analytically derive the E[Z]   
    
Hint: Determine first the conditional expectation E[Z|X = x] given a specific value of x for X. Then use an appropriate rule of probability to obtain the marginal expectation E[Z].
</div>

$$\begin{align}
Z &= Y_{1} + Y_{2} + Y_{3} + Y_{4} + Y_{5} + Y_{6} \\
E(Z|X=x) &= E(Y_{1} + Y_{2} + Y_{3} + Y_{4} + Y_{5} + Y_{6} | X=x)
\end{align}$$

Given the die is fair, probability of rolling a 1/2/3/4/5/6 are all equal therefore:
$$\begin{align}
E(Y_{1}| X=x) &= \frac{1}{6}(1+2+3+4+5+6)\\
&=\frac{21}{6}\\
&\\
E(Z| X=x) &= x*E(Y_1|X=x)\\
&\\
E(Z) &= \sum_x{E(Z|X=x)*P(X=x)}\\
&= \sum_x {x*\frac{21}{6}* P(X=x)}\\
& = \sum_{x=1}^{6}x*\frac{21}{6}*\frac{1}{6}\\
& = 12.25
\end{align}$$

The analytical result align with the simulateion above