# Bayesian statistics

In [None]:
# Imports
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

# Bayesian Analysis of Emergency Room Patient Arrivals

This notebook aims at illustrating how to use Bayesian inference for a practical example:

- **The estimation of the average rate of patient arrivals at an emergency room per hour**


## Bayesian framework: prior, likelihood and posterior

For a Bayesian analysis, we need to define 
- a likelihood of the observed data, and
- a prior distribution over the unknown parameters defining such likelihood

Overall we have
- Data: $D = \{x_1, x_2, \ldots, x_N\}$
    - in this case, data is number of patients per hour:
        - so $x_i$ is the number of patients in the $i$-th hour, where $x_i \in \mathbb{N}$
- Model: $p(x_i|\theta)$, is a distribution of the number of patients per hour
    - We need to decide what model makes sense
    - $\theta$ are the parameters of the data-distribution
- Prior: $p(\theta)$ is the probability of the parameters before observing the data
- Likelihood: $p(D|\theta)$ is the probability of observing the data given the parameters
- Posterior: $p(\theta|D) = \frac{p(D|\theta)p(\theta)}{p(D)}$ is the probability of the parameters given the data

### Model Definition

* **Model:** We will assume that patient arrivals follow a Poisson distribution
- Why? Because it is a common distribution for count data, and it is parameterized by $\lambda$ which is the average arrival rate.
    * $X \sim Poisson(\lambda)$,
        - $X$ is the number of patient arrivals in an hour.
    * $p(x|\lambda) = \frac{\lambda^x e^{-\lambda}}{x!}$
    * Where:
        * $k$ is a specific number of arrivals.
    * $\lambda$ is the average arrival rate (our parameter of interest).
        * $E[X] = \lambda$

In [None]:
# Let us understand the Poisson distribution with an example.

# Define the Poisson distribution
poisson_lambda = 10
poisson_dist = stats.poisson(poisson_lambda)

# x-space for the Poisson distribution
x = np.arange(0, 20)

In [None]:
# Plot the probability mass function
plt.bar(
    x,
    poisson_dist.pmf(x),
    color='blue',
    alpha=0.6,
    align='center',
    label='Poisson PMF'
)
plt.xlabel('Number of events')
plt.ylabel('Probability')
plt.title('Poisson Distribution')
plt.show()

In [None]:
# Plot the CDF
plt.bar(
    x,
    poisson_dist.cdf(x),
    color='blue',
    alpha=0.6,
    align='center',
    label='Poisson CDF',
)
plt.xlabel('Number of events')
plt.ylabel('Probability')
plt.title('Poisson Distribution')
plt.show()

### Prior Distribution

- We will use a Gamma distribution as our prior:
    - **Why?** Because the Gamma distribution is a conjugate prior to the Poisson distribution (i.e., we will get a Gamma distribution as our posterior).

- Mathematically,
    - $\lambda \sim \text{Gamma}(\alpha, \beta)$
    - Where:
        * $\alpha$ is the shape parameter.
        * $\beta$ is the rate parameter.


In [None]:
# Let us understand the Gamma distribution with an example.

# Define the Gamma distribution
gamma_alpha = 2
gamma_beta = 3
gamma_dist = stats.gamma(
    gamma_alpha,
    scale=1/gamma_beta
)

# x-space (theta) for the Gamma distribution
theta = np.linspace(0, 10, 1000)

In [None]:
# Plot the probability density function


In [None]:
# Plot the CDF


### Numerical simulation

In [None]:
# Define the Gamma prior, based on our prior knowledge


### Data Collection

- We observe patient arrivals for $n$ hours: $x = (x_1, x_2, ..., x_n)$.




In [None]:
# Observed data
# Number of events
data = np.array([3, 2, 5, 4, 3])
n = len(data)
# Not drawn from a Poisson distribution
true_lambda = None

### Posterior Distribution, after observing data

- Due to Gamma-Poisson conjugacy, the posterior is also a Gamma distribution.v That is, the posterior distribution is a Gamma distribution with updated hyperparameters.
    * $\lambda | x \sim \text{Gamma}(\alpha + \sum x_i, \beta + n)$
    * Where:
        * $\sum x_i$ is the sum of observed arrivals.
        * $n$ is the number of observed hours.
        * $\alpha^\prime$ and $\beta^\prime$ are the hyperparameters of the prior Gamma distribution, which are updated to:
            - $\alpha^\prime = \alpha + \sum x_i$,
            - $\beta^\prime = \beta + n$.

In [None]:
# Compute posterior parameters


## Analysis and Interpretation

In [None]:
# Calculate the prior mean and variance

# Calculate the posterior mean and variance

# Print the results

In [None]:
# Plot the prior and the posterior distributions
x = np.linspace(0, 10, 200)
# Prior pdf

# Prior mean

# Posterior pdf

# Posterior mean


In [None]:
# Calculate the probability of observing a new data point greater than a threshold
threshold = 2

# Using the prior distribution

# Using the posterior distribution


# Print the results
print(f"Probability of observing a new data point greater than {threshold}:")
print(f"Using the prior distribution: {probability_greater_than_threshold_prior}")
print(f"Using the posterior distribution: {probability_greater_than_threshold_posterior}")

# Plot the probability of observing a new data point greater than threshold over the prior and the posterior distributions' pdfs


In [None]:
# Calculate the 95% credible interval
# Using the prior distribution

# Using the posterior distribution


# Plot the 95% credible interval on the prior and posterior distributions
x = np.linspace(0, 10, 200)
# The prior
# and its credible interval
# The posterior
# and its credible interval


## Replicate the above with

- Data from a Poisson distribution with a known rate parameter

- An uninformative prior distribution
