<a href="https://colab.research.google.com/github/r-doz/PML2025/blob/main/./01_probability_exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Notebook 1: **Probability** basics exercises

<a href="https://colab.research.google.com/github/r-doz/PML2025/blob/main/./01_probability_exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Exercise 1**

We would like to estimate the effect of a drug D (D=1 the patient took the drug, D=0 didn't) on heart attack H (H=1 had heart attack, H=0 didn't) by looking at an observational dataset which contains also information about the sex of the patients (S = M/F).

- Control group (D=0):


|        | H = 1 | H = 0 |
|--------|-------|-------|
| Female |   1   |   19  |
| Male   |   12  |   28  |
| Total  |   13  |   47  |

 - Treatment group (D=1):

|        | H = 1 | H = 0 |
|--------|-------|-------|
| Female |   3   |   37  |
| Male   |   8   |   12  |
| Total  |   11  |   49  |

- Among males, what is the difference between the probability of having heart attack given that the patient took the drug and given that he didn't? Is the treatment working?
- What about females?
- What happens if you consider the whole population? 
- How can you explain that?

Let's calculate the probabilities step by step.

### **1. Among males**  
We compute the probability of having a heart attack (\( H = 1 \)) for males in both groups.

- **Control group (D = 0, Male)**:  
  
  P(H=1 | D=0, S=M) = \frac{12}{12+28} = \frac{12}{40} = 0.30
  
- **Treatment group (D = 1, Male)**:  
  $
  P(H=1 | D=1, S=M) = \frac{8}{8+12} = \frac{8}{20} = 0.40
  $

The difference is:  
$
0.40 - 0.30 = 0.10
$
Since the probability of heart attack is higher in the treatment group, the drug appears to be harmful for males.

### **2. Among females**  
- **Control group (D = 0, Female)**:  
  
  $P(H=1 | D=0, S=F) = \frac{1}{1+19} = \frac{1}{20} = 0.05$
  
- **Treatment group (D = 1, Female)**:  
  $
  P(H=1 | D=1, S=F) = \frac{3}{3+37} = \frac{3}{40} = 0.075
  $

The difference is:  
$
0.075 - 0.05 = 0.025
$
Again, the probability of a heart attack is slightly higher in the treatment group, suggesting the drug might not be helping females either.

### **3. Whole population**  
- **Control group (D = 0, Total)**:  
  $
  P(H=1 | D=0) = \frac{13}{13+47} = \frac{13}{60} \approx 0.217
  $
- **Treatment group (D = 1, Total)**:  
  $
  P(H=1 | D=1) = \frac{11}{11+49} = \frac{11}{60} \approx 0.183
  $

The difference is:  
$
0.183 - 0.217 = -0.034
$
Here, the probability of a heart attack is **lower** in the treatment group, suggesting that the drug appears to be beneficial when looking at the whole population.

### **4. Explanation (Simpson’s Paradox)**
We observe that for both males and females separately, the drug seems to **increase** the probability of a heart attack. However, when we aggregate across the entire population, the drug appears to **decrease** the probability of a heart attack. 

This is a classic case of **Simpson’s Paradox**, which occurs due to differences in subgroup sizes. In this case:
- **More females took the drug (40 vs. 20 males), and females have a much lower heart attack rate overall.**  
- Since females have lower risk and are overrepresented in the treatment group, they pull down the overall heart attack rate for the treatment group, making it look like the drug is working.  

In reality, when broken down by sex, the drug actually increases the heart attack risk for both groups.

### **Conclusion**
The drug seems harmful for both males and females when analyzed separately, despite appearing beneficial in the overall population. This highlights the importance of considering confounders like sex before making conclusions based on aggregated data.

### **Exercise 2**

Donated blood is screened for AIDS. Suppose the test has 99% accuracy,
and that one in ten thousand people in your age group are HIV positive. The
test has a 5% false positive rating, as well. Suppose the test screens you as
positive. What is the probability you have AIDS? Is it 99%? (Hint: 99% refers
to P(test positive|you have AIDS). You want to find P(you have AIDS|test
is positive).

We can solve this problem using **Bayes' Theorem**, which is:

$
P(A | B) = \frac{P(B | A) P(A)}{P(B)}
$

where:
- $ A $ is the event of **having HIV**.
- $ B $ is the event of **testing positive**.
- $ P(A) $ is the **prior probability** of having HIV.
- $ P(B | A) $ is the **sensitivity** (true positive rate).
- $ P(B | A^c) $ is the **false positive rate**.
- $ P(B) $ is the total probability of testing positive.

### **Step 1: Define Given Probabilities**
- $ P(A) = 1/10,000 = 0.0001 $ (prior probability of being HIV positive).
- $ P(B | A) = 0.99 $ (sensitivity: probability of testing positive if HIV positive).
- $ P(B | A^c) = 0.05 $ (false positive rate: probability of testing positive if HIV negative).
- $ P(A^c) = 1 - P(A) = 0.9999 $ (probability of being HIV negative).

### **Step 2: Compute $ P(B) $ (Total Probability of Testing Positive)**
Using the **law of total probability**:

$
P(B) = P(B | A) P(A) + P(B | A^c) P(A^c)
$

$
= (0.99 \times 0.0001) + (0.05 \times 0.9999)
$

$
= 0.000099 + 0.049995
$

$
= 0.050094
$

### **Step 3: Compute $ P(A | B) $ (Probability of Having HIV Given a Positive Test)**
Applying **Bayes' Theorem**:

$
P(A | B) = \frac{P(B | A) P(A)}{P(B)}
$

$
= \frac{0.99 \times 0.0001}{0.050094}
$

$
= \frac{0.000099}{0.050094}
$

$
\approx 0.001976 \quad \text or 0.1976%}
$

### **Conclusion**
If you test positive, the probability that you actually have HIV is **only about 0.2%**, not 99%! 

### **Intuition (Why So Low?)**
Even though the test is **99% accurate for true positives**, the false positive rate of **5%** is quite high, and **since HIV is very rare in your population, most of the positive tests are false positives**. This is a classic example of how even a highly accurate test can produce many false positives when testing for a rare condition.

Would you like me to explain this in a different way? 😊

### **Exercise 3**

Yuo are given a fair coin with probability $0.5$ or an unfair coin (having $P(\text{head})=0.8$) with probability $0.5$. Then you toss it two times, with results $H_1$ and $H_2$.

Let's call $C$ the random variable describing if the coin is fair or not.
1. Is $H_2$ independent from $H_1$?
2. Factorize $p(h_1, h_2 | c)$ (write it as a product of simpler terms)
3. Factorize $p(h_1, h_2, c)$
4. Compute $p(h_1)$
5. Compute $p(c | h_1)$
6. Compute $p(h_2 | h_1)$


Let's go step by step.

### **Given Data**
- We have two types of coins:
  - **Fair coin** ($ C = F $): $ P(H) = 0.5 $, $ P(T) = 0.5 $
  - **Unfair coin** ($ C = U $): $ P(H) = 0.8 $, $ P(T) = 0.2 $
- The probability of choosing each coin is:
  $
  P(C = F) = P(C = U) = 0.5
  $
- We flip the chosen coin twice, getting outcomes $ H_1 $ and $ H_2 $.

### **1. Is $ H_2 $ Independent from $ H_1 $?**
To check if $ H_2 $ is independent from $ H_1 $, we need to see if:

$
P(H_2 | H_1) = P(H_2)
$

If we knew which coin we were using ($ C $), then $ H_1 $ and $ H_2 $ would be independent. But since $ C $ is **unknown**, the result of $ H_1 $ gives us information about which coin is likely being used, which in turn affects the probability of $ H_2 $.

Thus, **$ H_1 $ and $ H_2 $ are not independent** because knowing $ H_1 $ changes our belief about which coin we are using, which then affects $ H_2 $.

### **2. Factorize $ p(h_1, h_2 | c) $**
Since the flips are independent given $ C $, we can write:

$
P(H_1, H_2 | C) = P(H_1 | C) P(H_2 | C)
$

For a fair coin ($ C = F $):

$
P(H_1, H_2 | C = F) = 0.5 \times 0.5 = 0.25
$

For an unfair coin ($ C = U $):

$
P(H_1, H_2 | C = U) = 0.8 \times 0.8 = 0.64
$

This confirms that the flips are independent **given** $ C $, but not necessarily independent overall.

### **3. Factorize $ p(h_1, h_2, c) $**
Using the chain rule:

$
P(H_1, H_2, C) = P(H_1 | H_2, C) P(H_2 | C) P(C)
$

But since flips are independent given $ C $:

$
P(H_1, H_2, C) = P(H_1 | C) P(H_2 | C) P(C)
$

### **4. Compute $ p(h_1) $**

We use the law of total probability:

$
P(H_1) = P(H_1 | C = F) P(C = F) + P(H_1 | C = U) P(C = U)
$

$
= (0.5)(0.5) + (0.8)(0.5)
$

$
= 0.25 + 0.4 = 0.65
$

### **5. Compute $ p(c | h_1) $** (Posterior probability of the coin type given $ H_1 $)

Using **Bayes' theorem**:

$
P(C = U | H_1) = \frac{P(H_1 | C = U) P(C = U)}{P(H_1)}
$

$
= \frac{(0.8)(0.5)}{0.65}
$

$
= \frac{0.4}{0.65} = \frac{8}{13} \approx 0.615
$

Similarly,

$
P(C = F | H_1) = \frac{P(H_1 | C = F) P(C = F)}{P(H_1)}
$

$
= \frac{(0.5)(0.5)}{0.65}
$

$
= \frac{0.25}{0.65} = \frac{5}{13} \approx 0.385
$

### **6. Compute $ p(h_2 | h_1) $**

Using the law of total probability:

$
P(H_2 | H_1) = P(H_2 | H_1, C = F) P(C = F | H_1) + P(H_2 | H_1, C = U) P(C = U | H_1)
$

Since $ H_2 $ is independent of $ H_1 $ given $ C $:

$
P(H_2 | H_1) = P(H_2 | C = F) P(C = F | H_1) + P(H_2 | C = U) P(C = U | H_1)
$

$
= (0.5) \times \frac{5}{13} + (0.8) \times \frac{8}{13}
$

$
= \frac{5}{26} + \frac{64}{130}
$

$
= \frac{65}{130} = 0.65
$

### **Final Answers:**
1. $ H_1 $ and $ H_2 $ are **not independent**.
2. $ P(H_1, H_2 | C) = P(H_1 | C) P(H_2 | C) $.
3. $ P(H_1, H_2, C) = P(H_1 | C) P(H_2 | C) P(C) $.
4. $ P(H_1) = 0.65 $.
5. $ P(C = U | H_1) = 8/13 $, $ P(C = F | H_1) = 5/13 $.
6. $ P(H_2 | H_1) = 0.65 $.

Let me know if you want any clarifications! 😊

### **Exercise 4**

Given $p(x,y)=$

|   | Y=0 | Y=1 |
|---|-----|-----|
| X=0 | 0.2  | 0.1  |
| X=1 | 0.15 | 0.0  |
| X=2 | 0.25  | 0.3 |



Calculate
- $p(y)$
- $p(x)$
- $p(x|y)$
- $p(y|x)$
- $\mathbb{E}[x]$
- $\mathbb{E}[y]$
- $\mathbb{E}[x|y]$
- $\text{cov}[x,y]$

You can do it by hand on a piece of paper, but I also suggest you doing it with the `numpy` library. You can do any of these computations with a single line of code.

In [None]:
import numpy as np

p = np.array([[0.2, 0.1],
                 [0.15, 0.0],
                 [0.25, 0.3],
                ])

x = np.array([0, 1, 2])
y = np.array([0, 1])

### **Exercise 5**

**Exercise 1**


Suppose we are interested in the relation between an exposure A (has been affected by coronavirus: 0 no, 1 yes) and an outcome Y (has myocarditis: 0 no, 1 yes). We conduct an observational study on a representative population and obtain the following proportions (N.B. this data is made up and does not come from a study):

|A \ Y| 0 | 1 |
|-----|---|---|
| 0 |0.75|0.07|
| 1 |0.15|0.03|

s.t. for example, among all subjects, 3% has been affected by coronavirus and has myocarditis.

- Among the exposed subjects, what is the proportion on individuals that have the outcome?
- Are A and Y independent?

### **Exercise 6**

Given the distribution $p(x,y) = x + y \text{ if } x \in [0,1], y \in [0,1], 0 \text{ otherwise}$

Calculate
- $\mathbb{E}[x|y]$
- $\rho[X,Y]$

This time, I suggest you trying the `sympy` library, that can help you with symbolic computations.

In [None]:
from sympy import symbols, integrate, log, sqrt

# Define the symbols
x, y = symbols('x y')

# Define the joint distribution
p_xy = (x + y)

# Example of integral
Z = integrate(p_xy, (x, 0, 1), (y, 0, 1))

print("\nZ:", Z)


Z: 1


Let's compute each requested quantity step by step using both mathematical derivation and `numpy` code.

---

### **1. Compute \( p(y) \) (Marginal Probability of \( Y \))**
The marginal probability is obtained by summing over all values of \( X \):

\[
p(y) = \sum_{x} p(x, y)
\]

For \( Y=0 \):

\[
p(Y=0) = 0.2 + 0.15 + 0.25 = 0.6
\]

For \( Y=1 \):

\[
p(Y=1) = 0.1 + 0.0 + 0.3 = 0.4
\]

In `numpy`:

```python
p_y = p.sum(axis=0)  # Sum over rows (X)
```

---

### **2. Compute \( p(x) \) (Marginal Probability of \( X \))**
The marginal probability is obtained by summing over all values of \( Y \):

\[
p(x) = \sum_{y} p(x, y)
\]

For \( X=0 \):

\[
p(X=0) = 0.2 + 0.1 = 0.3
\]

For \( X=1 \):

\[
p(X=1) = 0.15 + 0.0 = 0.15
\]

For \( X=2 \):

\[
p(X=2) = 0.25 + 0.3 = 0.55
\]

In `numpy`:

```python
p_x = p.sum(axis=1)  # Sum over columns (Y)
```

---

### **3. Compute \( p(x|y) \) (Conditional Probability of \( X \) Given \( Y \))**
Using:

\[
p(x | y) = \frac{p(x, y)}{p(y)}
\]

For \( Y=0 \):

\[
p(X=0 | Y=0) = \frac{0.2}{0.6} = \frac{1}{3}
\]

\[
p(X=1 | Y=0) = \frac{0.15}{0.6} = 0.25
\]

\[
p(X=2 | Y=0) = \frac{0.25}{0.6} \approx 0.4167
\]

For \( Y=1 \):

\[
p(X=0 | Y=1) = \frac{0.1}{0.4} = 0.25
\]

\[
p(X=1 | Y=1) = \frac{0.0}{0.4} = 0
\]

\[
p(X=2 | Y=1) = \frac{0.3}{0.4} = 0.75
\]

In `numpy`:

```python
p_x_given_y = p / p_y  # Divide each column by p_y
```

---

### **4. Compute \( p(y|x) \) (Conditional Probability of \( Y \) Given \( X \))**
Using:

\[
p(y | x) = \frac{p(x, y)}{p(x)}
\]

For \( X=0 \):

\[
p(Y=0 | X=0) = \frac{0.2}{0.3} = \frac{2}{3}
\]

\[
p(Y=1 | X=0) = \frac{0.1}{0.3} = \frac{1}{3}
\]

For \( X=1 \):

\[
p(Y=0 | X=1) = \frac{0.15}{0.15} = 1
\]

\[
p(Y=1 | X=1) = \frac{0.0}{0.15} = 0
\]

For \( X=2 \):

\[
p(Y=0 | X=2) = \frac{0.25}{0.55} \approx 0.4545
\]

\[
p(Y=1 | X=2) = \frac{0.3}{0.55} \approx 0.5455
\]

In `numpy`:

```python
p_y_given_x = p / p_x[:, None]  # Divide each row by p_x
```

---

### **5. Compute \( \mathbb{E}[X] \) (Expected Value of \( X \))**
\[
E[X] = \sum_x x \cdot p(x)
\]

\[
E[X] = 0(0.3) + 1(0.15) + 2(0.55) = 0 + 0.15 + 1.1 = 1.25
\]

In `numpy`:

```python
E_x = (x @ p_x)
```

---

### **6. Compute \( \mathbb{E}[Y] \) (Expected Value of \( Y \))**
\[
E[Y] = \sum_y y \cdot p(y)
\]

\[
E[Y] = 0(0.6) + 1(0.4) = 0 + 0.4 = 0.4
\]

In `numpy`:

```python
E_y = (y @ p_y)
```

---

### **7. Compute \( \mathbb{E}[X | Y] \)**
\[
E[X | Y] = \sum_x x \cdot p(x | y)
\]

For \( Y=0 \):

\[
E[X | Y=0] = 0 \cdot \frac{1}{3} + 1 \cdot 0.25 + 2 \cdot 0.4167 = 0 + 0.25 + 0.8333 = 1.0833
\]

For \( Y=1 \):

\[
E[X | Y=1] = 0 \cdot 0.25 + 1 \cdot 0 + 2 \cdot 0.75 = 0 + 0 + 1.5 = 1.5
\]

In `numpy`:

```python
E_x_given_y = (x @ p_x_given_y)
```

---

### **8. Compute \( \text{cov}(X, Y) \) (Covariance)**
Using:

\[
\text{cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X] \mathbb{E}[Y]
\]

First, compute \( \mathbb{E}[XY] \):

\[
E[XY] = \sum_x \sum_y x y p(x, y)
\]

\[
= 0(0.2) + 0(0.1) + 1(0.15)(0) + 2(0.25)(0) + 2(0.3)(1)
\]

\[
= 0 + 0 + 0 + 0 + 0.6 = 0.6
\]

Then,

\[
\text{cov}(X, Y) = 0.6 - (1.25 \times 0.4) = 0.6 - 0.5 = 0.1
\]

In `numpy`:

```python
E_xy = (x[:, None] * y[None, :]) @ p.flatten()
cov_xy = E_xy - E_x * E_y
```

---

### **Final Computed Values:**
- \( p(y) = [0.6, 0.4] \)
- \( p(x) = [0.3, 0.15, 0.55] \)
- \( p(x|y) = \begin{bmatrix} \frac{1}{3} & 0.25 \\ 0.25 & 0 \\ 0.4167 & 0.75 \end{bmatrix} \)
- \( p(y|x) = \begin{bmatrix} \frac{2}{3} & \frac{1}{3} \\ 1 & 0 \\ 0.4545 & 0.5455 \end{bmatrix} \)
- \( \mathbb{E}[X] = 1.25 \)
- \( \mathbb{E}[Y] = 0.4 \)
- \( \mathbb{E}[X | Y] = [1.0833, 1.5] \)
- \( \text{cov}(X, Y) = 0.1 \)

Let me know if you need any clarification! 🚀

### **Exercise 7**

Two alternative definitions of conditional independence were given, prove that they are equivalent:
$$p(a | b, c)=p(a | c) \Longleftrightarrow p(a, b | c)=p(a|c)p(b|c)$$

To prove the equivalence of the two definitions of conditional independence:

1. \( p(a | b, c) = p(a | c) \)
2. \( p(a, b | c) = p(a | c) p(b | c) \)

we will show that if one holds, the other must also hold. 

### **Step 1: Assume \( p(a | b, c) = p(a | c) \) holds and prove \( p(a, b | c) = p(a | c) p(b | c) \)**

We are given that:

\[
p(a | b, c) = p(a | c)
\]

By the definition of conditional probability, we can write:

\[
p(a | b, c) = \frac{p(a, b | c)}{p(b | c)}
\]

From the assumption, we know that:

\[
\frac{p(a, b | c)}{p(b | c)} = p(a | c)
\]

Multiplying both sides by \( p(b | c) \), we get:

\[
p(a, b | c) = p(a | c) p(b | c)
\]

Thus, we have shown that if \( p(a | b, c) = p(a | c) \), then \( p(a, b | c) = p(a | c) p(b | c) \).

---

### **Step 2: Assume \( p(a, b | c) = p(a | c) p(b | c) \) holds and prove \( p(a | b, c) = p(a | c) \)**

Now, assume:

\[
p(a, b | c) = p(a | c) p(b | c)
\]

By the definition of conditional probability, we have:

\[
p(a | b, c) = \frac{p(a, b | c)}{p(b | c)}
\]

Substitute the assumption into the right-hand side:

\[
p(a | b, c) = \frac{p(a | c) p(b | c)}{p(b | c)}
\]

The \( p(b | c) \) terms cancel out, leaving:

\[
p(a | b, c) = p(a | c)
\]

Thus, we have shown that if \( p(a, b | c) = p(a | c) p(b | c) \), then \( p(a | b, c) = p(a | c) \).

---

### **Conclusion**

We have shown that:

\[
p(a | b, c) = p(a | c) \Longleftrightarrow p(a, b | c) = p(a | c) p(b | c)
\]

Therefore, the two definitions of conditional independence are equivalent.

### **Exercise 8**

Compute the variance of the following unnormalized distribution (numerically, use `scipy.integrate`)
$$p(x) \propto sin(x)^2 e^{-|x|}$$

To compute the variance of the unnormalized distribution \( p(x) \propto \sin^2(x) e^{-|x|} \), we need to perform the following steps:

### **Steps to compute variance**

1. **Find the normalized distribution**: First, we need to normalize the given distribution to make it a valid probability distribution.
   \[
   p(x) = \frac{\sin^2(x) e^{-|x|}}{Z}
   \]
   where \( Z \) is the normalization constant (partition function), computed as:
   \[
   Z = \int_{-\infty}^{\infty} \sin^2(x) e^{-|x|} \, dx
   \]

2. **Compute the expected value \( \mathbb{E}[X] \)**: The expected value of \( X \) is given by:
   \[
   \mathbb{E}[X] = \int_{-\infty}^{\infty} x p(x) \, dx
   \]

3. **Compute the second moment \( \mathbb{E}[X^2] \)**: The second moment is:
   \[
   \mathbb{E}[X^2] = \int_{-\infty}^{\infty} x^2 p(x) \, dx
   \]

4. **Compute the variance**: Finally, the variance \( \text{Var}(X) \) is given by:
   \[
   \text{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2
   \]

Now, let's compute these quantities numerically using `scipy.integrate`.

### **Python Code**

```python
import numpy as np
from scipy.integrate import quad

# Define the unnormalized distribution
def p_unnormalized(x):
    return np.sin(x)**2 * np.exp(-np.abs(x))

# Normalization constant Z
def normalization_constant():
    # Integrate to find the normalization constant
    result, _ = quad(p_unnormalized, -np.inf, np.inf)
    return result

# Normalized distribution
def p(x, Z):
    return p_unnormalized(x) / Z

# Expected value E[X]
def expected_value(Z):
    result, _ = quad(lambda x: x * p(x, Z), -np.inf, np.inf)
    return result

# Second moment E[X^2]
def second_moment(Z):
    result, _ = quad(lambda x: x**2 * p(x, Z), -np.inf, np.inf)
    return result

# Compute Z (normalization constant)
Z = normalization_constant()

# Compute expected value E[X] and second moment E[X^2]
E_X = expected_value(Z)
E_X2 = second_moment(Z)

# Compute variance
variance = E_X2 - E_X**2

print(f"Variance: {variance}")
```

### **Explanation**:

- `p_unnormalized(x)` defines the unnormalized probability distribution.
- `normalization_constant()` computes the constant \( Z \) by integrating the unnormalized distribution over \( (-\infty, \infty) \).
- `expected_value(Z)` computes \( \mathbb{E}[X] \) by integrating \( x p(x) \).
- `second_moment(Z)` computes \( \mathbb{E}[X^2] \) by integrating \( x^2 p(x) \).
- Finally, the variance is computed as \( \text{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \).

Let me know if you need any further clarifications!

### **Exercise 9**

You are proposed to play the following game: you toss a coin a first time, if tail appears, you win 1€ and the game ends, if head appears you win 2€ and you can continue playing. From the second toss on, if tail appears you stop playing, if head appears the amount you already won doubles and you can keep playing.

Example:
1) You toss the coin and **head** appears (you are winning 2€)
2) You toss the coin again and **head** appears (you are winning 4€)
3) You toss the coin again and **head** appears (you are winning 8€)
4) You toss the coin again and **tail** appears (the game ends and you won a total amount of 8€)

Let's call $X$ the amount of money you win playing this game.
1. What is the expected amount you win?
2. What is the expected value of $\log_2(X)$?
3. How much would you pay for playing this game?

Hint: $ \sum_{i=1}^{\infty} i q^i = \frac{q}{(1-q)^2}$

Let's carefully go through the problem and compute the expected values and costs of the game.

### **1. Expected amount won ( \( \mathbb{E}[X] \) )**

First, we need to define the problem more clearly:

- On the first toss:
  - If **tail** (probability = \( \frac{1}{2} \)), you win 1€ and the game ends.
  - If **head** (probability = \( \frac{1}{2} \)), you win 2€, and then you continue to the second toss.

- On the second toss:
  - If **tail** (probability = \( \frac{1}{2} \)), you stop and the game ends, winning 4€.
  - If **head** (probability = \( \frac{1}{2} \)), you double the amount you won so far and continue the game.

- On the \( n \)-th toss:
  - If **tail**, the game ends, and you win \( 2^{n-1} \) €.
  - If **head**, the amount doubles, and you continue the game.

Now, let's express the expected value \( \mathbb{E}[X] \) of the game.

The probability that you stop after \( n \) tosses is:

- On the \( n \)-th toss, you need exactly \( (n-1) \) heads followed by a tail. The probability of this happening is \( \left( \frac{1}{2} \right)^n \).

For each such case, the amount won is \( 2^{n-1} \).

Thus, the expected value is the sum of the winnings over all possible outcomes:

\[
\mathbb{E}[X] = \sum_{n=1}^{\infty} 2^{n-1} \cdot \left( \frac{1}{2} \right)^n
\]

Simplifying:

\[
\mathbb{E}[X] = \sum_{n=1}^{\infty} 2^{n-1} \cdot \left( \frac{1}{2} \right)^n = \sum_{n=1}^{\infty} 2^{n-1} \cdot 2^{-n} = \sum_{n=1}^{\infty} 2^{-1} = \frac{1}{2} \sum_{n=1}^{\infty} 1
\]

This sum is a geometric series with the first term \( \frac{1}{2} \) and common ratio \( \frac{1}{2} \). Therefore, its sum is:

\[
\mathbb{E}[X] = 1
\]

### **2. Expected value of \( \log_2(X) \) ( \( \mathbb{E}[\log_2(X)] \) )**

Now, we need to compute the expected value of \( \log_2(X) \). 

The expected value of \( \log_2(X) \) is:

\[
\mathbb{E}[\log_2(X)] = \sum_{n=1}^{\infty} \log_2(2^{n-1}) \cdot \left( \frac{1}{2} \right)^n
\]

Since \( \log_2(2^{n-1}) = n - 1 \), this becomes:

\[
\mathbb{E}[\log_2(X)] = \sum_{n=1}^{\infty} (n-1) \cdot \left( \frac{1}{2} \right)^n
\]

We can split the sum:

\[
\mathbb{E}[\log_2(X)] = \sum_{n=1}^{\infty} n \cdot \left( \frac{1}{2} \right)^n - \sum_{n=1}^{\infty} \left( \frac{1}{2} \right)^n
\]

The second sum is a simple geometric series:

\[
\sum_{n=1}^{\infty} \left( \frac{1}{2} \right)^n = 1
\]

The first sum is the expected value of a geometric distribution with parameter \( \frac{1}{2} \):

\[
\sum_{n=1}^{\infty} n \cdot \left( \frac{1}{2} \right)^n = \frac{2}{1} = 2
\]

Thus, we have:

\[
\mathbb{E}[\log_2(X)] = 2 - 1 = 1
\]

### **3. How much would you pay for playing this game?**

The amount you would pay for the game is the expected value of the game, which we computed in part 1.

Since the expected amount won is \( \mathbb{E}[X] = 1 \), the fair price for playing the game would be 1€.

---

### **Summary of Results:**

1. The expected amount you win is \( \mathbb{E}[X] = 1 \).
2. The expected value of \( \log_2(X) \) is \( \mathbb{E}[\log_2(X)] = 1 \).
3. The fair price you would pay for playing this game is \( 1 \)€.

### **Exercise 10**

It's night and you are looking into the sky waiting to see a falling star. A friend of yours tells you that the waiting time $T$ (hours) is distributed exponentially: $p(t) = 2e^{-2t}$.
1. What is the probability of seeing the first falling star within 1 hour? How much time do you expect to wait?
2. You have not seen anything in one hour, what is the probability of seeing a the first falling star in the next 1 hour? (justify your answer)
3. Is the waiting time dependent on how much you have already waited? Is the answer the same for any distribution?
3. What is the probability of seeing at least two falling stars in the first 1 hour?

4. Let's say the distribution is instead $p(t) = Uniform(0, 4)$ (for example, there is a known comet expected to show up at a certain point), how do aswers to questions 1 and 2 change?

### **1. Probability of seeing the first falling star within 1 hour and expected waiting time**

#### a. Probability of seeing the first falling star within 1 hour

The given distribution is exponential with a rate parameter \( \lambda = 2 \), i.e., the probability density function (PDF) is:

\[
p(t) = 2e^{-2t} \quad \text{for} \quad t \geq 0
\]

The probability of seeing the first falling star within 1 hour is:

\[
P(T \leq 1) = \int_0^1 2e^{-2t} \, dt
\]

We can calculate this integral as follows:

\[
\int 2e^{-2t} \, dt = -e^{-2t}
\]

Evaluating from 0 to 1:

\[
P(T \leq 1) = \left[-e^{-2t}\right]_0^1 = -e^{-2(1)} + e^{-2(0)} = 1 - e^{-2}
\]

Numerically, \( e^{-2} \approx 0.1353 \), so:

\[
P(T \leq 1) \approx 1 - 0.1353 = 0.8647
\]

Thus, the probability of seeing the first falling star within 1 hour is approximately **0.8647**.

#### b. Expected waiting time

The expected value \( \mathbb{E}[T] \) of an exponentially distributed random variable with rate \( \lambda \) is given by:

\[
\mathbb{E}[T] = \frac{1}{\lambda}
\]

For \( \lambda = 2 \), we have:

\[
\mathbb{E}[T] = \frac{1}{2} = 0.5 \, \text{hours}
\]

Thus, the expected waiting time is **0.5 hours**.

---

### **2. Probability of seeing the first falling star in the next 1 hour, given no star in the first hour**

The key idea here is that the exponential distribution **has no memory**, which means that the probability of seeing the first falling star in the next hour is the same regardless of how long you've already waited. This property is called the **memoryless property**.

So, the probability of seeing the first falling star in the next hour, given that you've already waited 1 hour, is:

\[
P(T \leq 2 \mid T > 1) = P(T \leq 1) = 1 - e^{-2}
\]

We already computed this in part 1, so the probability is approximately **0.8647**.

---

### **3. Dependence of waiting time on previous wait time (Memoryless Property)**

As mentioned above, the exponential distribution is **memoryless**. This means that the waiting time does not depend on how much you have already waited. Mathematically:

\[
P(T > t + s \mid T > s) = P(T > t)
\]

This holds for exponential distributions, but it **does not** hold for all distributions. For example, in distributions like the **uniform** distribution or **normal** distribution, the waiting time depends on how much time has already passed.

For exponential distributions, the waiting time is independent of the past. However, for other distributions, such as the **uniform distribution**, the waiting time is generally **not** independent of past waiting times.

---

### **4. Probability of seeing at least two falling stars in the first 1 hour**

For this, we need to find the probability of observing at least 2 falling stars in the first hour. The number of events (falling stars) in a fixed time interval for an exponential distribution follows a **Poisson distribution**. For rate \( \lambda = 2 \), the probability of observing \( k \) events in time \( t \) is:

\[
P(k \text{ events in time } t) = \frac{(\lambda t)^k e^{-\lambda t}}{k!}
\]

For \( t = 1 \) hour and \( \lambda = 2 \), the probability of seeing at least two falling stars is:

\[
P(\text{at least 2 stars in 1 hour}) = 1 - P(0 \text{ stars}) - P(1 \text{ star})
\]

First, we calculate \( P(0) \) and \( P(1) \):

- \( P(0) = \frac{(2 \cdot 1)^0 e^{-2 \cdot 1}}{0!} = e^{-2} \)
- \( P(1) = \frac{(2 \cdot 1)^1 e^{-2 \cdot 1}}{1!} = 2e^{-2} \)

Thus:

\[
P(\text{at least 2 stars}) = 1 - e^{-2} - 2e^{-2}
\]

Using \( e^{-2} \approx 0.1353 \), we get:

\[
P(\text{at least 2 stars}) = 1 - 0.1353 - 2 \times 0.1353 = 1 - 0.4059 = 0.5941
\]

Thus, the probability of seeing at least two falling stars in the first hour is approximately **0.5941**.

---

### **5. Changing distribution to Uniform(0, 4)**

Now, let's assume the distribution of waiting time is uniform over the interval \( [0, 4] \), i.e.,

\[
p(t) = \frac{1}{4} \quad \text{for} \quad 0 \leq t \leq 4
\]

#### a. Probability of seeing the first falling star within 1 hour

For a uniform distribution, the probability of seeing the first falling star within 1 hour is simply the proportion of the interval \( [0, 4] \) that is within 1 hour:

\[
P(T \leq 1) = \frac{1}{4} \times 1 = 0.25
\]

Thus, the probability of seeing the first falling star within 1 hour is **0.25**.

#### b. Probability of seeing the first falling star in the next 1 hour, given no star in the first hour

Since the distribution is uniform and the waiting time is independent of the past (it has no memory), the probability of seeing the first falling star in the next 1 hour (given no star in the first hour) is the same as the probability of seeing the first falling star in the first 1 hour, which is:

\[
P(T \leq 1) = 0.25
\]

Thus, the probability of seeing the first falling star in the next 1 hour, given no star in the first hour, is **0.25**.

---

### **Summary of Answers:**

1. Probability of seeing the first falling star within 1 hour: **0.8647**, expected waiting time: **0.5 hours**.
2. Probability of seeing the first falling star in the next 1 hour, given no star in the first hour: **0.8647** (memoryless property).
3. The waiting time is independent of how much you have already waited for the exponential distribution (memoryless). This is **not true for all distributions**.
4. Probability of seeing at least two falling stars in the first 1 hour: **0.5941**.
5. With a uniform distribution \( Uniform(0, 4) \):
   - Probability of seeing the first falling star within 1 hour: **0.25**.
   - Probability of seeing the first falling star in the next 1 hour, given no star in the first hour: **0.25**.