# **Sample Variance and Unbiased Estimation**

This notebook explains the concept of **Sample Variance**, why we divide by *(n âˆ’ 1)*, and how it relates to **Population Variance**.

## **1. Sample Variance Formula**

For a **sample**, the variance measures how far the data points are spread out from the sample mean.

### **Formula:**

(sÂ²) = ( Î£ (xi âˆ’ xÌ„)Â² ) / (n âˆ’ 1)

Where:
- **xi** â†’ each data point
- **xÌ„** â†’ sample mean
- **n** â†’ number of data points in the sample

### **Why divide by (n âˆ’ 1)?**
This is known as **Besselâ€™s Correction**. It corrects the bias that occurs when estimating the population variance from a small sample.

If we divide by **n**, the variance tends to **underestimate** the true population variance.

So, dividing by **(n âˆ’ 1)** makes it an **unbiased estimator** of the population variance.

## **2. Population Variance Formula**

For the entire population, we use:

ÏƒÂ² = ( Î£ (xi âˆ’ Î¼)Â² ) / N

Where:
- **Î¼** â†’ population mean
- **N** â†’ total number of population data points

## **3. Comparison Table**

| Type | Formula | Divisor | Used For |
|------|----------|----------|-----------|
| **Population Variance (ÏƒÂ²)** | Î£(xi âˆ’ Î¼)Â² / N | N | Entire population |
| **Sample Variance (sÂ²)** | Î£(xi âˆ’ xÌ„)Â² / (n âˆ’ 1) | n âˆ’ 1 | Sample data |

---

**Key Point:** Sample variance (with nâˆ’1) gives an unbiased estimate of population variance.

## **4. Example Calculation**

Sample data: {1, 1, 5, 5}

1. Compute sample mean:
   
   xÌ„ = (1 + 1 + 5 + 5) / 4 = 3

2. Compute squared deviations:

| xi | (xi âˆ’ xÌ„) | (xi âˆ’ xÌ„)Â² |
|----|-----------|-----------|
| 1  | -2        | 4         |
| 1  | -2        | 4         |
| 5  | 2         | 4         |
| 5  | 2         | 4         |

Î£(xi âˆ’ xÌ„)Â² = 16

3. Sample variance:

sÂ² = 16 / (4 âˆ’ 1) = 16 / 3 = **5.33**

## **5. Python Implementation**

In [1]:
import numpy as np

# Sample data
sample = np.array([1, 1, 5, 5])

# Sample variance (unbiased, divides by n-1)
sample_variance = np.var(sample, ddof=1)

# Population variance (divides by n)
population_variance = np.var(sample)

print("Sample Variance (unbiased):", sample_variance)
print("Population Variance (biased):", population_variance)

Sample Variance (unbiased): 5.333333333333333
Population Variance (biased): 4.0


## **6. Interview Tip ðŸ’¡**

- **Population Variance** â†’ divide by **N**
- **Sample Variance** â†’ divide by **n âˆ’ 1** (unbiased)
- **Besselâ€™s Correction** ensures that the sample variance better represents the population variance when sample size is small.

This is a **very common interview question**, especially in data analysis and statistics-related roles!