# Simulation Lab 1: Derivation

## Book exercises

### Chapter 3

Exercises 3.1 - 3.8

### Chapter 4

Exercises 4.1 - 4.9

### Chapter 5

Exercises 5.1 - 5.6

## Coding exercises

In [1]:
from typing import Literal

import matplotlib.pyplot as plt
from matplotlib.axes import Axes
from matplotlib.figure import Figure
import numpy as np
import scipy.stats

N_SAMPLES = 1000
# Random number generator:
# Set the seed to None to get different sequences of values between restarts
# Set it to a number to get the same sequences of values between restarts
RNG = np.random.default_rng(seed=0)

### 1. Empirical mean/variance checks

Consider Binomial distributions with $n=10$.

Generate a random parameterization $p$ for it, without inspecting the value.

Draw a large (e.g., 1000) number of samples.

Estimate the mean and variance from the samples.

Knowing the distribution type, try to recover the parameterization from the mean estimate.

Inspect the true parameterization and get the corresponding statistical mean/variance values.

Compare the estimated mean/variance/parameterization with the true one.

Repeat for the other distributions from class: Bernoulli $(p)$, Geometric $(p)$, and Poisson $(\lambda)$.

In [2]:
def get_rand_param(low: float, high: float, n_decimals=2) -> np.floating:
    ...

In [3]:
def estimate_mean(samples: np.ndarray) -> np.floating:
    ...

In [4]:
def estimate_var(samples: np.ndarray) -> np.floating:
    ...

In [5]:
def compare_est_stat(
        distr: scipy.stats.distributions.rv_frozen,
        est_mean: np.floating, est_var: np.floating,
        true_param: np.floating, mean_est_param: np.floating) -> None:
    true_mean: np.floating
    true_var: np.floating
    true_mean, true_var = distr.stats(moments='mv')
    print(
        f"True mean: {true_mean.round(4)} | "
        f"Estimated mean: {est_mean.round(4)} | "
        f"Absolute difference: {np.absolute(true_mean - est_mean).round(4)}")
    print(
        f"True variance: {true_var.round(4)} | "
        f"Estimated variance: {est_var.round(4)} | "
        f"Absolute difference: {np.absolute(true_var - est_var).round(4)}")
    print(f"True parameter: {true_param}")
    print(
        f"Mean-estimated parameter: {mean_est_param.round(4)} | "
        f"Absolute difference: {np.absolute(true_param - mean_est_param).round(4)}")

#### Binomial

In [None]:
binom_n = 10
binom_true_param = get_rand_param(low=0.1, high=0.9, n_decimals=2)
binom_distr = scipy.stats.binom(n=binom_n, p=binom_true_param)
binom_samples = binom_distr.rvs(size=N_SAMPLES, random_state=RNG)
binom_est_mean = estimate_mean(binom_samples)
binom_est_var = estimate_var(binom_samples)
binom_mean_est_p: np.floating = binom_est_mean / binom_n
compare_est_stat(
    binom_distr,
    binom_est_mean, binom_est_var,
    binom_true_param, binom_mean_est_p)

#### Bernoulli

In [None]:
...

#### Geometric

In [None]:
...

#### Poisson

In [None]:
...

### 2. P.M.F. and C.D.F. plotting

Again, consider the samples from each distribution in the last exercise.

Use them to plot each distribution's probability mass function and cumulative probability function.

In [10]:
def plot(samples: np.ndarray, type: Literal["pmf", "cdf"], title: str) -> Figure:
    ...

#### Binomial

In [None]:
fig_binom_pmf = plot(
    binom_samples, "pmf", f"Binomial (n={binom_n}, p={binom_true_param}) estimated P.M.F.")

In [None]:
fig_binom_cdf = plot(
    binom_samples, "cdf", f"Binomial (n={binom_n}, p={binom_true_param}) estimated C.D.F.")

#### Bernoulli

In [None]:
...

In [None]:
...

#### Geometric

In [None]:
...

In [None]:
...

#### Poisson

In [None]:
...

In [None]:
...

### 3. Linearity of expectation

Consider the example of the drinking game from the lectures.

People at the party do not remember which cup was originally theirs, so they grab one at random.

Because sampling happens without replacement, the people's random variables are not independent.

Thus, verify that the linearity of expectation holds via simulation.

In [19]:
n_people = 10
# Index i represents the s-th simulation
# The value at the index is the number of people who got back their cup
drinking_samples = np.empty(N_SAMPLES, dtype=int)

for i in range(N_SAMPLES):
    ...

In [None]:
drinking_simulations_mean = estimate_mean(drinking_samples)
f"Mean for the number of people who get their own cup back over the simulations: {drinking_simulations_mean:.4f}"