# Describing Data with Statistics

With statistics we can study, describe, and better understand sets of data.

## The Mean

The mean is a common and intuitive way to summarize a set of numbers. It’s what we might simply call the __“average”__ in everyday use, although as we’ll see, there are other kinds of averages as well. Let’s take a sample set of numbers and calculate the mean.

We’ll write a program that calculates and prints the mean for a collection of numbers. To calculate the mean, we’ll need to take the sum of the list of numbers and divide it by the number of items in the list. We have two Python functions that make both of these operations very easy: sum() and len(). So our code will look like:

In [5]:
'''
Calculating the mean
'''

def calculate_mean(numbers):
    s = sum(numbers)
    N = len(numbers)

    mean = s/N
    return mean

donations = [100, 60, 70, 900, 100, 200, 500, 500, 503, 600, 1000, 1200]
mean = calculate_mean(donations)

print("The mean of our set of data is {:.2f}.".format(mean))

The mean of our set of data is 477.75.


The calculate_mean() function will calculate the sum and length of any list, so we can reuse it to calculate the mean for other sets of numbers, too.

## The Median

The median of a collection of numbers is another kind of average. To find the median, we sort the numbers in ascending order. If the length of the list of numbers is odd, the number in the middle of the list is the median. If the length of the list of numbers is even, we get the median by taking the mean of the two middle numbers. Let’s find the median of the previous list of donations (assume, just for this example that we have another donation total for the 13th).

Before we write a program to find the median of a list of numbers, let’s think about how we could automatically calculate the middle elements of a list in either case. If the length of a list ($N$) is odd, the middle number is the one in position $(N + 1)/2$. If $N$ is even, the two middle elements are $N/2$ and $(N/2) + 1$.

In order to write a function that calculates the median, we’ll also need to sort a list in ascending order. Luckily, the sort() method does just that, so our program will look like:

In [10]:
'''
Calcultating the median
'''

def calculate_median(numbers):
    N = len(numbers)
    numbers.sort()

    if (N % 2) == 0:
        # N is even 
        m1 = N/2
        m2 = (N+1)/2
        
        # convert to integer and list match position
        m1 = int(m1) - 1
        m2 = int(m2) - 1
        median = (numbers[m1] + numbers[m2])/2
    else:
        m = (N/2) + 1
        # convert to integer, match position
        m = int(m) - 1
        median = numbers[m]

    return median

donations = [100, 60, 70, 900, 100, 200, 500, 500, 503, 600, 1000, 1200]
donations2 = [100, 60, 70, 900, 100, 200, 500, 500, 503, 600, 1000, 1200, 300]

print('Median with 12 items: ', calculate_median(donations))
print('Median with 13 items: ', calculate_median(donations2))

Median with 12 items:  500.0
Median with 13 items:  500


As you can see, the mean (477.75) and the median (500) are pretty close in this particular list, but the median is a little higher.