# 03: Measures of Variability 

**Measures of Variability** refers to the descriptions of the amounts by which scores are dispersed or scattered within a distribution. 

- Statistics flourishes because we live in a world of variability.
- When summarizing data, it is critical that we specify both measures of central tendency and measures of variability.

- Measures of variability are nonexistent for nominal data. It is best to categorize variability in terms of maximum, intermediate and minimum variability. 

- For ordinal data, variability can be described using extreme scores. 

</br> 

- **Range**: The difference between the largest and smallest scores. It's not a stable measure of variability because the size of the range tends to vary with the size of the group. Moreover, it fails to use information provided by the scores within the range. 

- **Variance**: The mean of all squared deviation scores.

- **Standard Deviation**: The rough measure of the average (or standard) amount by which scores deviate on either side of their mean. It's the square root of the variance. 

- **Interquartile Range (IQR)**: The range for the middle 50 percent of the scores. It equals the distance between the third quartile (or 75th percentile) and the first quartile (or 25th percentile). It is not sensitive to the distorting effect of extreme scores. 


In [36]:
import math
import pandas as pd

### Worked Example 

Determine the values of the range and the IQR for the following sets of data.

(a) Retirement ages: 60, 63, 45, 63, 65, 70, 55, 63, 60, 65, 63

(b) Residence changes: 1, 3, 4, 1, 0, 2, 5, 8, 0, 2, 3, 4, 7, 11, 0, 2, 3, 4

In [37]:
ret_ages = pd.Series([60, 63, 45, 63, 65, 70, 55, 63, 60, 65, 63])


range = ret_ages.max() - ret_ages.min()
iqr = ret_ages.quantile(.75, interpolation='nearest') - ret_ages.quantile(.25, interpolation='nearest')

print(f'Range = {range}, IQR = {iqr}')

Range = 25, IQR = 5


In [38]:
res_changes = pd.Series([1, 3, 4, 1, 0, 2, 5, 8, 0, 2, 3, 4, 7, 11, 0, 2, 3, 4])


range = res_changes.max() - res_changes.min()
iqr = res_changes.quantile(.75, interpolation='nearest') - res_changes.quantile(.25, interpolation='nearest')

print(f'Range = {range}, IQR = {iqr}')

Range = 11, IQR = 3


## Calculating Standard Deviations 

- For most frequency distributions, a majority (often as many as 68 percent) of all scores are within one standard deviation on either side of the mean. 

- For most frequency distributions, a small minority (often as small as 5 percent) of all scores deviate more than two standard deviations on either side of the mean.

-  The mean is a measure of position, but the standard deviation is a measure of distance (on either side of the mean of the distribution)

- The value of the standard deviation can never be negative. 


- **Sum of Squares (SS)**: The sum of squared deviation scores. 

$$ 
SS = \sum{(X-\mu)^2}
$$

1. Subtract the population mean, $\mu$ from each original score, $X$, to obtain a deviation scores, $X - \mu$. 
2. Square each deviation score, $(X-\mu)^2$, to eliminate negative signs. 
3. Sum all squared deviation scores, $\sum{(X-\mu)^2}$. 

There is a more efficient computation formula which is the algebraic derivation of the aforementioned formula. 

$$
SS = \sum{X^2} - \dfrac{(\sum{X})^2}{N}
$$

- For samples, the formula is same, with the sample notations ($\bar{X}$, $n$) replacing population symbols ($\mu$, $N$)

$$
SS = \sum{(X - \bar{X})^2}
$$

$$
SS = \sum{X^2} - \dfrac{(\sum{X})^2}{n}
$$

$$
variance = \dfrac{sum \, of \, all \, squared \, deviation \, scores}{number \, of \, squares}
$$

##### Variance for population 
$$
\sigma^2 = \dfrac{SS}{N}
$$

##### Standard deviation for population 
$$
\sigma = \sqrt{\sigma^2} = \sqrt{\dfrac{SS}{N}}
$$

##### Variance for sample

Though the $SS$ remains, the same, the denominator is changed to $n-1$ 

$$
s^2 = \dfrac{SS}{n-1}
$$

##### Standard deviation for sample 
$$
s = \sqrt{s^2} = \sqrt{\dfrac{SS}{n-1}}
$$

- In most instances, the standard deviation is less than one-half the size of the range. 

### Worked Examples 

In [39]:
samples = [1, 3, 4, 4]

ss = sum([x**2 for x in samples]) - ( (sum(samples)**2) / len(samples))

variance = ss/(len(samples) - 1)

sd = math.sqrt(variance)

print(f'SS = {ss}, Variance = {variance}, SD = {sd}')

SS = 6.0, Variance = 2.0, SD = 1.4142135623730951


In [40]:
population = [1, 3, 7, 2, 0, 4, 7, 3]

ss = sum([x**2 for x in population]) - ( (sum(population)**2) / len(population))

variance = ss/len(population)

sd = math.sqrt(variance)

print(f'SS = {ss}, Variance = {variance}, SD = {sd}')

SS = 45.875, Variance = 5.734375, SD = 2.394655507583502


In [41]:
ser = pd.Series([1, 3, 7, 2, 0, 4, 7, 3])

ser.std(ddof=0)

2.394655507583502

In [42]:
samples = [10, 8, 5, 0, 1, 1, 7, 9, 2]

ss = sum([x**2 for x in samples]) - ( (sum(samples)**2) / len(samples))

variance = ss/(len(samples) - 1)

sd = math.sqrt(variance)

print(f'SS = {ss}, Variance = {variance}, SD = {sd}')

SS = 119.55555555555554, Variance = 14.944444444444443, SD = 3.865804501581067


In [43]:
ser = pd.Series([10, 8, 5, 0, 1, 1, 7, 9, 2])

ser.std()

3.8658045015810676

## Degree of Freedom 

- **Degree of Freedom (df)**: The number of values free to vary given one or more mathematical restrictions, in a sample being used to estimate a population characteristic.

- using $n$ instead of $n-1$ when calculating sample variance would usually result in an underestimate of variability.

- When $\mu$ is unknown, only $n-1$ deviations are free to vary because the absolute value of the summation of $n-1$ variables must equal to the absolute value of the remaining variable due to the zero-sum restriction. Since one bit of valid information is lost when the sample mean replaces the population mean, we need to use $n-1$ to obtain a correct estimate of the variability. 

- Degree of freedom is only important when we are using scores in a sample to *estimate* some unknown characteristic of the population. 

- **zero-sum restriction**: When all observations are expressed as deviations from their mean, the sum of all deviations must equal zero. 
