# Statistical Concepts: From Theory to Practice

This notebook serves as an interactive cheat sheet connecting key statistical concepts. We start with a central reference table outlining the core formulas and their relationships. Then, we use Python to demonstrate how these theoretical **population parameters** relate to the practical **sample statistics** we calculate from data.

## 📘 Core Concepts and Formulas Table

| Concept & Notation | Standard Formula | Rearrangements / Computational Form | Connection to Other Concepts |
|:---|:---|:---|:---|
| **Population Mean** ($\mu$) | $\mu = \frac{\sum_{i=1}^{N} x_i}{N}$ | N/A | This is the true, fixed average of the entire population. It's the value that the **Sample Mean** ($\bar{x}$) aims to estimate. It's also the **Expected Value** ($E[X]$) for a random variable drawn from the population. |
| **Sample Mean** ($\bar{x}$) | $\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}$ | N/A | The average of a sample. It's the primary unbiased estimator for the **Population Mean** ($\mu$). It is used to calculate the **Sample Variance** ($s^2$). |
| **Expected Value** ($E[X]$) | $E[X] = \sum x \cdot P(x)$ | For a discrete uniform distribution (like a dice roll), this simplifies to the **Population Mean** formula. | Conceptually, it's the long-run average of a random variable. For a population dataset, $E[X]$ is identical to $\mu$. |
| **Population Variance** ($\sigma^2$) | $\sigma^2 = \frac{\sum_{i=1}^{N} (x_i - \mu)^2}{N}$ | $\sigma^2 = \frac{\sum x^2 - (\sum x)^2/N}{N} = E[X^2] - (E[X])^2$ | Measures the true spread of the entire population around the **Population Mean** ($\mu$). The square root of this is the **Standard Deviation** ($\sigma$). |
| **Sample Variance** ($s^2$) | $s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}$ | $s^2 = \frac{\sum x^2 - (\sum x)^2/n}{n-1}$ | The best unbiased estimator for the **Population Variance** ($\sigma^2$). It uses the **Sample Mean** ($\bar{x}$) in its calculation. The $n-1$ denominator (Bessel's correction) corrects for the bias of using $\bar{x}$ instead of the unknown $\mu$. |
| **Standard Deviation** ($\sigma$, $s$) | $\sigma = \sqrt{\sigma^2}$ <br> $s = \sqrt{s^2}$ | N/A | It is the square root of the corresponding variance ($\sigma^2$ or $s^2$). It brings the measure of spread back into the original units of the data, making it more interpretable than variance. |

## ⚙️ Python Demonstrations

Now, let's use Python to bring these formulas to life. We'll start by defining a complete, known population and calculating its true parameters.

### Part 1: Calculating Population Parameters (Theory)

In [1]:
import numpy as np

# Imagine this is our entire, known population (e.g., the numbers on a standard die).
population = np.array([1, 2, 3, 4, 5, 6])

# 1. Calculate the true Population Mean (μ)
pop_mean = np.mean(population)

# 2. Calculate the true Population Variance (σ²)
# np.var() uses the population formula (divides by N) by default.
pop_variance = np.var(population) 

print(f"--- Population Parameters (True Values) ---")
print(f"Population Mean (μ): {pop_mean:.2f}")
print(f"Population Variance (σ²): {pop_variance:.2f}")

--- Population Parameters (True Values) ---
Population Mean (μ): 3.50
Population Variance (σ²): 2.92


### Part 2: Calculating Sample Statistics (Practice)

In [3]:
# Now, let's pretend we don't know the whole population.
# We'll take a small random sample to *estimate* the population parameters.
sample_size = 4
sample = np.random.choice(population, size=sample_size, replace=False)

# 1. Calculate the Sample Mean (x̄)
sample_mean = np.mean(sample)

# 2. Calculate the Sample Variance (s²)
# We must use ddof=1 (Delta Degrees of Freedom) to divide by (n-1) for an unbiased estimate.
sample_variance = np.var(sample, ddof=1)

print(f"--- Sample Statistics (Estimates) ---")
print(f"Sample Data: {sample}")
print(f"Sample Mean (x̄): {sample_mean:.2f}")
print(f"Sample Variance (s²): {sample_variance:.2f}")

print("\nNote: These sample statistics are estimates of the true population parameters.")
print("If you re-run this cell, you'll get different estimates each time!")

--- Sample Statistics (Estimates) ---
Sample Data: [4 3 5 1]
Sample Mean (x̄): 3.25
Sample Variance (s²): 2.92

Note: These sample statistics are estimates of the true population parameters.
If you re-run this cell, you'll get different estimates each time!


## ✅ Summary: Tying It All Together

This notebook demonstrates the fundamental relationship between population and sample.

- The **Population parameters** ($\mu$, $\sigma^2$) described in the table are fixed, theoretical values that define the entire group. Our first Python cell calculated these exact values because we had access to the full population.
- The **Sample statistics** ($\bar{x}$, $s^2$) are our real-world estimates of those parameters. As shown in the second Python cell, these values are calculated from a subset of the data and vary from sample to sample. The formulas, especially the $n-1$ denominator for sample variance, are specifically designed to make these statistics the best possible guess for the true population parameters.