# The Dispersion

The dispersion, tells us how far away the numbers in a set of data are from the mean of the data set. There are different measurements of dispersion like: range, variance, and standard deviation.

## The Range of a Set of Numbers

You could have two groups of numbers with the exact same mean but with vastly different ranges, so knowing the range fills in more information about a set of numbers beyond what we can learn from just looking at the mean, median, and mode. To find the range:

In [3]:
'''
Find the range
'''

def find_range(numbers):
    lowest = min(numbers)
    highest = max(numbers)

    # the range
    r = highest - lowest
    return lowest, highest, r

list_numbers = [100, 60, 70, 900, 100, 200, 500, 500, 503, 600, 1000, 1200]
lowest, highest, range = find_range(list_numbers)
print("Lowest: {}\nHighest: {}\nRange: {}".format(lowest, highest, range))

Lowest: 60
Highest: 1200
Range: 1140


The function find_range() accepts a list as a parameter and finds the range. First, it calculates the lowest and the highest numbers using the min() and the max() functions.
We then calculate the range by taking the difference between the highest and the lowest numbers, using the label r to refer to this difference. At end, we return all three numbers

## The Variance and Standard Deviation

But and if we want to know more about how all of the individual numbers vary from the mean? Were they all similar, clustered near the mean, or were they all different, closer to the extremes? There are two measures of dispersion that tell us more about that: the variance and the standard deviation. The variance is the average of the squares of those differences.

$$ Variance = \dfrac{\Sigma (x_i -  x_{mean})^2}{n} $$

$x_i$ - individual numbers

$x$ - the mean of these numbers 

$n$ - number of values in the list

If we want to calculate the standard deviation as well, all we have to do is take the square root of the variance. Values that are within one standard deviation of the mean can be thought of as fairly typical, whereas values that are three or more standard deviations away from the mean can be considered much more atypical—we call such values _outliers_.

$$ Deviation = \sqrt{Variance} = \sqrt{\dfrac{\Sigma (x_i -  x_{mean})^2}{n}}$$

For this all, we have the following program:

In [10]:
'''
The variance and standard deviation
'''

def calculate_mean(numbers):
    s = sum(numbers)
    N = len(numbers)
    # the mean
    mean = s/N
    return mean

def find_differences(numbers):
    # mean
    mean = calculate_mean(numbers)

    # find differences
    diff = []
    for number in numbers:
        diff.append(number - mean)

    return diff

def calculate_variance(numbers):
    # list of differences
    diff = find_differences(numbers)

    # apply the formula
    squared_diff = []
    for number in diff:
        squared_diff.append(number**2)

    sum_squared_diff = sum(squared_diff)
    variance = sum_squared_diff/len(numbers)
    return variance

list_numbers = [100, 60, 70, 900, 100, 200, 500, 500, 503, 600, 1000, 1200]
variance = calculate_variance(list_numbers)
print('The variance of the list of numbers is {:.2f}'.format(variance))

std_deviation = variance**0.5
print('The standard deviation of the list of numbers is {:.2f}'.format(std_deviation))

The variance of the list of numbers is 141047.35
The standard deviation of the list of numbers is 375.56


The variance and the standard deviation are both very large, meaning that the individual daily total donations vary greatly from the mean. Now, let’s compare the variance and the standard deviation for a different set of numbers:

In [12]:
list_numbers2 = [382, 389, 377, 397, 396, 368, 369, 392,398, 367, 393, 396]
variance = calculate_variance(list_numbers2)
print('The variance of the list of numbers is {:.2f}'.format(variance))

std_deviation = variance**0.5
print('The standard deviation of the list of numbers is {:.2f}'.format(std_deviation))

The variance of the list of numbers is 135.39
The standard deviation of the list of numbers is 11.64


Respectively, lower values for variance and standard deviation tell us that the individual numbers are closer to the mean.

---

[Describing Data with Statistics](statistics.ipynb)

[Main Page](../README.md)