### degrees of freedom (df)

In statistics, the concept of degrees of freedom (df) refers to the number of values in the final calculation of a statistic that are free to vary. Degrees of freedom are often discussed in the context of various statistical tests or procedures, such as the t-test, chi-square test, or when estimating variances. Here's a simple Julia example that illustrates the concept of degrees of freedom when estimating the sample variance.

### Estimating Sample Variance

The formula for the unbiased sample variance ($s^2$) of a dataset involves dividing the sum of squared deviations from the mean by $n - 1$, where $n$ is the sample size. The $n - 1$ term represents the degrees of freedom in this context. This correction is necessary because the sample mean itself is estimated from the data, reducing the degrees of freedom by 1.




In [1]:
using Statistics

In [2]:
∑ = sum;
Diff²

LoadError: UndefVarError: `Diff²` not defined

In [3]:
# Sample data: Randomly generated numbers or any specific data
X₀ = [2, 4, 4, 4, 5, 5, 7, 9]

# Function to calculate sample variance
function sample_variance(Xₙ)
    n = length(Xₙ) # Sample size
    m̄ = mean(Xₙ) # Sample mean
    Diff² = ∑((x - m̄)^2 for x in Xₙ) # Sum of squared deviations
    return Diff² / (n - 1) # Divide by n - 1 (degrees of freedom)
end

# Calculate and print the sample variance
variance = sample_variance(X₀)
println("Sample Variance: $variance")

# Degrees of freedom in this context
df = length(X₀) - 1
println("Degrees of Freedom: $df")

Sample Variance: 4.571428571428571
Degrees of Freedom: 7


In this example:
- The `sample_variance` function computes the sample variance of an array of data.
- The `$∑((x - m̄)^2 for x in Xₙ)` part calculates the sum of the squared differences between each data point and the sample mean.
- We divide by `n - 1` to get the unbiased sample variance, where `n` is the number of observations in the sample. The degrees of freedom (`df`) in this context is `n - 1`, reflecting the fact that after using one degree of freedom to estimate the mean, `n - 1` values are free to vary.

This example demonstrates how degrees of freedom are used in the computation of the sample variance, a fundamental concept in many statistical analyses.