# STT 441: Probability, Section 002, Dr. Yuying Xie
## Homework 9 - 11/08/2024

This notebook contains the computational work completed by Lowell Monis toward Homework 9.

**Question 17.** In the textbook, we calculate the confidence interval of $p$ via norm approximation

$P(-\epsilon < p - \hat{p} < \epsilon) \approx P \left( -\frac{\sqrt{n} \epsilon}{\sqrt{p(1 - p)}} < Z < \frac{\sqrt{n} \epsilon}{\sqrt{p(1 - p)}} \right)$

Here to handle the unknown $p$, we use a lower bound trick **(Type I)**

$P \left( -\frac{\sqrt{n} \epsilon}{\sqrt{p(1 - p)}} < Z < \frac{\sqrt{n} \epsilon}{\sqrt{p(1 - p)}} \right) > 2 \Phi(2 \epsilon \sqrt{n}) - 1$

Another way is to replace $p$ simply with $\hat{p} = S_n / n$ **(Type II)**. Now, let’s compare these two types of confidence intervals.

1. **Step 1**: Simulate $n = 64$ Bernoulli random variables with $p = 0.3$ and calculate $\hat{p}$. Calculate two types of $95\%$ confidence intervals as described above and check whether they contain the true $p$.
2. **Step 2**: Repeat Step 1 for $m = 1000$ times and check which type of $95\%$ confidence interval will at least cover the true $p$ for more than $95\%$ of times.
3. **Step 3**: Repeat Step 2 for 100 times. What conclusion will you make?

***

### Solution

#### Define parameters and setup the simulation

1. Sample size $n = 64$
1. True proportion $p = 0.3$
2. We need to simulate $m=1000$ samples of size $n$ between the two types of confidence intervals.
3. Each simulation verifies if the interval covers $p$. Confidence set at $95\%$.

#### Define confidence intervals

##### Type I (Lower-bound Trick)

$$P \left( -\frac{\sqrt{n} \epsilon}{\sqrt{p(1 - p)}} < Z < \frac{\sqrt{n} \epsilon}{\sqrt{p(1 - p)}} \right) > 2 \Phi(2 \epsilon \sqrt{n}) - 1$$

This calculation involves knowing $p$. As visible above, it is used in the interval construction.

##### Type II

We use the estimated $\hat{p}=\frac{S_n}{n}$.

#### Run Simulation

For each of the $1000$ simulations, we generate $64$ Bernoulli trials with $p=0.3$. We calculate $\hat{p}$ for each sample and then compute both types of $95\%$ confidence intervals. Then we verify if each interval contains $p=0.3$.

We repeat the experiment $100$ times to be sure of our answers. We then present the percentage of times each type of interval successfully covers the true value of $p$ across these simulations.

In [9]:
# setup, importing required modules

import numpy as np
from scipy.stats import norm

# parameters
p = 0.3       # true proportion
n = 64        # sample size for each simulation
m = 1000      # number of simulations
trials = 100  # numnber of times to repeat the entire process

# confidence level
c = 0.95
ep = 1 - c
z = norm.ppf(1 - ep / 2)

# result storage
type1 = []
type2 = []

# run the trial for the specified number of times
for _ in range(trials):
    type1c = 0
    type2c = 0

    # running m simulations for each trial
    for _ in range(m):
        # step 1
        sample = np.random.binomial(1, p, n)
        p_hat = np.mean(sample)  # Sample proportion

        # type I
        type1_margin = z * np.sqrt(p * (1 - p) / n)
        type1_lower = p - type1_margin
        type1_upper = p + type1_margin

        # type II
        type2_margin = z * np.sqrt(p_hat * (1 - p_hat) / n)
        type2_lower = p_hat - type2_margin
        type2_upper = p_hat + type2_margin

        # check if the intervals contain p
        if type1_lower <= p <= type1_upper:
            type1c += 1
        if type2_lower <= p <= type2_upper:
            type2c += 1

    # Calculate coverage percentage for this trial
    type1.append(type1c / m)
    type2.append(type2c / m)

# average coverage over all trials
type1_avg_coverage = np.mean(type1) * 100
type2_avg_coverage = np.mean(type2) * 100

type1_avg_coverage, type2_avg_coverage

(100.0, 94.413)

#### Conclusion

From the above experiment, the Type I intervals always cover the true value, as compared to the intervals based on the estimate (Type II), which also have a high confidence level, albeit lower than Type I. Therefore, **I will rely more on the Type I Confidence Interval in this scenario in terms of coverage**.

#### References

I referred to the Scipy website to learn more about statistical functions and `scipy.stats`.