## Helper notebook for calculating Sample Variance 
### Course 2, Week 4, 03-Decision tree learning, Regression Trees (optional)

This notebook aims to be useful in order to check the values shown in lectures when showing the values for each node variance

### Sample Variance Formula

The sample variance $ s^2 $ is calculated using the formula:

$$
s^2 = \frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{x})^2
$$

Where:
- $ n $ is the number of data points.
- $ x_i $ represents each individual data point.
- $ \bar{x} $ is the mean of the data, calculated as $ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i $.
- The expression $ (x_i - \bar{x})^2 $ represents the squared difference between each data point and the mean.

The sample variance divides by $ n - 1 $ instead of $ n $ to correct for bias in the estimation of the population variance from a sample.


In [12]:
def sample_variance(data: list[float]) -> float:
    n = len(data)
    mean = sum(data) / n  # Calculate the mean of the data
    sum_of_squares = sum((x - mean) ** 2 for x in data)  # Sum of squared differences
    variance = sum_of_squares / (n - 1)  # Divide by (n - 1) for sample variance

    return variance

# Data sets
data_sets: dict[str, list[float]] = {
    "Root node": [7.2, 9.2, 8.4, 7.6, 10.2, 8.8, 15, 11, 18, 20],
    "Ear shape left branch": [7.2, 9.2, 8.4, 7.6, 10.2],
    "Ear shape right branch": [8.8, 15, 11, 18, 20],
    "Face shape left branch": [7.2, 15, 8.4, 7.6, 10.2, 18, 20],
    "Face shape right branch": [8.8, 9.2, 11],
    "Whiskers left branch": [7.2, 8.8, 9.2, 8.4],
    "Whiskers right branch": [15, 7.6, 11, 10.2, 18, 20]
}

# Iterate through each dataset and calculate the variance
for label, data in data_sets.items():
    variance = sample_variance(data)
    print(f"The sample variance for {label} is: {variance:.2f}")

The sample variance for Root node is: 20.51
The sample variance for Ear shape left branch is: 1.47
The sample variance for Ear shape right branch is: 21.87
The sample variance for Face shape left branch is: 27.80
The sample variance for Face shape right branch is: 1.37
The sample variance for Whiskers left branch is: 0.75
The sample variance for Whiskers right branch is: 23.32
