# Homework 6.2 - Coding

This is the coding portion of the homework assignment for Section 6.2

In [179]:
import numpy as np

In [180]:
a = np.array([1.1, 2.2, 3.3, 4.4])
a.dtype

dtype('float64')

## Problem 6.8

Consider a process where we sample from Bernoulli(0.5) $n=1000$ times, simulating a repeated coin flip, and where we then the sample mean $\hat{\mu}_{1000}$.

### Part (i)

In the following markdown cell, find and write the upper bound from the law of large numbers for $n=1000$ and a given $\varepsilon$. 

**Response:** _P(u_hat_subn - 0.5| >= epsilon) <= 0.25 / 1000 * epsilon^2_

### Part (ii)

We proceed in a few steps:

(Step 1): Write code inside of the function `bernoulli_trial()` to sample from the Bernouli(0.5) distribution 1000 times, simulating a repeated coin flip, returning the sample mean $\hat{\mu}_{1000}$.

Your function should accept a numpy random number generator `rng`, and should use this to produce the desired samples.

In [181]:
def bernoulli_trial(rng: np.random.Generator) -> float | np.floating:
    """Sample from Bernoulli(0.5) 1000 times, and return the sample mean.
    
    Args:
        rng (np.random.Generator) - A random number generator
            created by np.random.default_rng(). This should be used 
            for any sampling from distributions.
    
    Returns:
        float - The sample mean of all 1000 samples.
    """

    return rng.binomial(1, 0.5, size=1000).mean()

(Step 2): Now, write a function `bernoulli_experiment()` that repeats this experiment $n$ times, calling `bernoulli_trial()` $n$ times, and saves the resulting sample mean of each trial as an entry in an output array.

Your function should accept numpy random number generator `rng`, as well as a number of trials `n` to perform.

It should return a numpy array with $n$ entries, each one being the sample mean of the corresponding trial.

In [182]:
def bernoulli_experiment(rng: np.random.Generator, n: int) -> np.ndarray:
    """Repeats the bernoulli trial n times.

    Args:
        rng (np.random.Generator) - A random number generator
            created by np.random.default_rng(). This should be used 
            for any sampling from distributions.
        n (int) - The number of trials to perform
    
    Returns:
        np.ndarray - An array where the i'th entry is the sample mean of the 
            i'th trial
    """
    return np.array([bernoulli_trial(rng) for _ in range(n)])


(Step 3): Finally, write a function `threshold_comparison()` which takes an array of sample means `trials`, an actual mean `mu`, and a threshold value `eps`, and returns the percentage of entries in `trials` that satisfy $|\text{trials} - \text{mu}| \geq \text{eps}$.

In [None]:
def threshold_comparison(trials: np.ndarray, mu: float, eps: float) -> float | np.floating:
    """Returns the percentage of trials that satisfy |trials - mu| >= eps
    
    Args:
        trials (np.ndarray) - An array of sample means of trials of an experiment
        mu (float) - The actual mean of the underlying distribution the trials were drawn from
        eps (float) - The threshold for comparison
    
    Returns:
        float - The percentage of sample means in trials that are at least eps
            away from the underlying distribution's means; that is, the percentage
            of entries in trials such that |trials - mu| >= eps
    """
    mask = np.abs(trials - mu) >= eps
    num_trials_valid = float(len(trials[mask]))

    
    return (num_trials_valid / float(len(trials)) )




Now, in the following code cell, do the following:

1. Create a random-number generator `rng`.
2. Using `rng`, call `bernoulli_experiment()` to run 100 trials of `bernoulli_trial()`, and store the results.
3. For $\varepsilon \in \{0.1, 0.01, 0.001\}$, pass the results of step (2) to `threshold_comparison()`, along with the true mean $\mu = 0.5$ and $\varepsilon$, to see the percentage of 1000-trial experiments whose sample means are farther than $\varepsilon$ away from $\mu$.
4. Print the percentages to the screen in a readable way

In [184]:
rng = np.random.default_rng()
results = bernoulli_experiment(rng, 100)
for eps in [0.1, 0.01, 0.001]:
    percent = threshold_comparison(results, 0.5, eps)
    print(percent)

0.0
65.0
98.0


Compare your results to the bound from the law of large numbers. What do you notice? Write a sentence or two in the following markdown cell describing your observations.

**Observations:** _at eps=0.01, we have the upper bound as 1/0.4 >1.  at eps=0.1 the upper bound is 1/40, which holds true._

---

## Problem 6.9

We will do a similar process to the previous problem, using the Beta(1,9) distribution instead of Bernouli(0.5).

### Part (i)

In the following markdown cell, find and write the upper bound from the law of large numbers for this new distribution, $n=1000$, and a given $\varepsilon$. 

**Response:** _Delete this text and write your response here_

### Part (ii)

We proceed in a few steps:

(Step 1): Write code inside of the function `beta_trial()` to sample from the Beta(1,9) distribution 1000 times, returning the sample mean $\hat{\mu}_{1000}$.

Your function should accept a numpy random number generator `rng`, and should use this to produce the desired samples.

In [185]:
def beta_trial(rng: np.random.Generator) -> float | np.floating:
    """Sample from Beta(1,9) 1000 times, and return the sample mean.
    
    Args:
        rng (np.random.Generator) - A random number generator
            created by np.random.default_rng(). This should be used 
            for any sampling from distributions.
    
    Returns:
        float - The sample mean of all 1000 samples.
    """
    return rng.beta(1, 9, size=1000).mean()
    

(Step 2): Now, write a function `beta_experiment()` that repeats this experiment $n$ times, calling `beta_trial()` $n$ times, and saves the resulting sample mean of each trial as an entry in an output array.

Your function should accept numpy random number generator `rng`, as well as a number of trials `n` to perform.

It should return a numpy array with $n$ entries, each one being the sample mean of the corresponding trial.

In [186]:
def beta_experiment(rng: np.random.Generator, n: int) -> np.ndarray:
    """Repeats the beta trial n times.

    Args:
        rng (np.random.Generator) - A random number generator
            created by np.random.default_rng(). This should be used 
            for any sampling from distributions.
        n (int) - The number of trials to perform
    
    Returns:
        np.ndarray - An array where the i'th entry is the sample mean of the 
            i'th trial
    """
    return np.array([beta_trial(rng) for _ in range(n)])

Now, in the following code cell, do the following:

1. Create a random-number generator `rng`.
2. Using `rng`, call `beta_experiment()` to run 100 trials of `beta_trial()`, and store the results.
3. For $\varepsilon \in \{0.1, 0.01, 0.001\}$, pass the results of step (2) to `threshold_comparison()`, along with (i) the true mean $\mu$ of Beta(1,9) and (ii) $\varepsilon$, to see the percentage of 1000-trial experiments whose sample means are farther than $\varepsilon$ away from $\mu$.
4. Print the percentages to the screen in a readable way

In [187]:
rng = np.random.default_rng()
results = beta_experiment(rng, 100)
for eps in [0.1, 0.01, 0.001]:
    percent = threshold_comparison(results, 0.1, eps)
    print(percent)

0.0
0.0
69.0


Compare your results to the bound from the law of large numbers. What do you notice? Write a sentence or two in the following markdown cell describing your observations.

**Observations:** _They obey the law.  For example, where eps = 0.001, the bound is 9/(100*11)*0.000001 = 9/0.011 >1._

---

IMPORTANT: Please "Restart and Run All" and ensure there are no errors. Then, submit this .ipynb file to Gradescope.