# Topic 7: Describing Data with Statistics:
$\bullet$ Mean, Median, Mode, 

$\bullet$ Creating a frequency table, 

$\bullet$ Dispersion, 

$\bullet$ Variance, 

$\bullet$ Standard deviation.

( Reference book: Doing math with Python by Amit Saha)


In this topic, we'll use Python to explore
statistics so we can study, describe, and
better understand sets of data. After looking
at some basic statistical measures: the mean,
median, mode, and range. We'll move on to some
more advanced measures, such as variance and standard
deviation. Then, we'll see how to calculate the
correlation coefficient, which allows you to quantify the relationship
between two sets of data.



### 1.  Finding the mean




The **mean** is the sum of the numbers in a list and divide it by the number of items in the list. 

To calculate the mean, we’ll need to take the sum of the list of numbers and divide it by the number of items in the list. Let's look at two Python functions that make both of these operations very easy: sum() and len().

Functions that readily comes with Python are called **built-in functions**. Python provides built-in functions like print(), sum() etc. But we can also create your own functions. These functions are known as **user defines functions**. In the next program we create a user defined function.

We also use **if __name__ == '__main__'**  which is used to execute some code only if the file was run directly, and not imported.



### Q 1. School charity that’s been taking donations over a period of time spanning the last 12 days. In that time, the following 12 numbers represent the total dollar amount of donations received for each day: 100, 60, 70, 900, 100, 200, 500, 500, 503, 600, 1000, and 1200. Write a program that calculates and prints the mean donation per day.

In [2]:
''' Calculating the mean of donations = [100, 60, 70, 900, 100, 200, 500, 500, 503, 600, 1000, 1200] '''

def calculate_mean(numbers):
    s = sum(numbers)
    N = len(numbers)
    mean = s/N  # Calculate the mean
    return mean

if __name__ == '__main__':
    donations = [100, 60, 70, 900, 100, 200, 500, 500, 503, 600, 1000, 1200]
    mean = calculate_mean(donations)
    n = len(donations)
    print('Mean donation over the last {0} days is {1}'.format(n, mean))

Mean donation over the last 12 days is 477.75


# 2. Finding the Median

The **median** of a collection of numbers is another kind of average. To find the median, we sort the numbers in ascending order. 

$\bullet$ If the length of the list of numbers is odd, the number in the middle of the list is the median.\
$\bullet$ If the length of the list of numbers is even, we get the median by taking the mean of the two middle numbers.



After sorting from smallest to largest, the list of numbers becomes 60, 70, 100, 100, 200, 500, 500, 503, 600, 900, 1000, and 1200. We have an even number of items in the list, that is 12, so to get the median, we need to take the mean of the two middle numbers. In this case, the middle numbers are the sixth and the seventh numbers, 500 and 500, and the mean of these two numbers is (500 + 500)/2, which comes out to 500. That means the median is 500.


### Q 2. Let's find the median of the previous list of donations: 100, 60, 70, 900, 100, 200, 500, 500, 503, 600, 1000, and 1200.

In [3]:
''' Calculating the median '''
def calculate_median(numbers):
    N = len(numbers)
    numbers.sort()
# Find the median
    if N % 2 == 0: 
        # if N is even
        m1 = N/2
        m2 = (N/2) + 1
        # Convert to integer, match position
        m1 = int(m1) - 1
        m2 = int(m2) - 1
        median = (numbers[m1] + numbers[m2])/2
    else:
        m = (N+1)/2
        # Convert to integer, match position
        m = int(m) - 1
        median = numbers[m]
    return median
if __name__ == '__main__':
    donations = [100, 60, 70, 900, 100, 200, 500, 500, 503, 600, 1000, 1200]
    median = calculate_median(donations)
    N = len(donations)
    print('Median donation over the last {0} days is {1}'.format(N, median))

Median donation over the last 12 days is 500.0


# 3. Finding the Mode 

**Mode** of a sect of numbers is the number that occurs most frequently. 


To write a program to calculate the mode, we'll need to have Python
count how many times each number occurs within a list and print the one
that occurs most frequently. For this we use the Counter class from the collections module.
The most_common() method of the Counter class returns a list
ordered by the most common elements.

In [5]:
#Dicription of the fuctions to be used:

simplelist = [4, 2, 1, 3, 4]
from collections import Counter
c = Counter(simplelist)
c.most_common()

[(4, 2), (2, 1), (1, 1), (3, 1)]

In [8]:
#If we want to find the most common element, we would call it with the argument 1:
c.most_common(1)


[(4, 2)]

In [9]:
#If we want to find the first two common element, we would call it with the argument 2:
c.most_common(2)
#Gives the first the most common element, followed by the second most common element.

[(4, 2), (2, 1)]

### Q 3. consider the test scores of a math test (out of 10 points) in a class of 20 students: 7, 8, 9, 2, 10, 9, 9, 9, 9, 4, 5, 6, 1, 5, 6, 7, 8, 6, 1, and 10. Find the mode of this list.

In [10]:
''' Calculating the mode '''
from collections import Counter
def calculate_mode(numbers):
    c = Counter(numbers)
    mode = c.most_common(1)
    return mode[0][0]
if __name__=='__main__':
    scores = [7, 8, 9, 2, 10, 9, 9, 9, 9, 4, 5, 6, 1, 5, 6, 7, 8, 6, 1, 10]
    mode = calculate_mode(scores)
    print('The mode of the list of numbers is: {0}'.format(mode))

The mode of the list of numbers is: 9


### $\bullet$ Multiple modes:

If a set of data have two or more numbers that occur the same maximum number of times.

### Q 4. In the list of numbers 5, 5, 5, 4, 4, 4, 9, 1, and 3, both 4 and 5 are present three times. In such cases, the list of numbers is said to have multiple modes, and our program should find and print all the modes. 

In [13]:
''' Calculating the mode when the list of numbers may have multiple modes
'''
from collections import Counter
def calculate_mode(numbers):
    c = Counter(numbers)
    numbers_freq = c.most_common()
    max_count = numbers_freq[0][1]
    modes = []
    for num in numbers_freq:
        if num[1] == max_count:
            modes.append(num[0])
    return modes
if __name__ == '__main__':
    scores = [5, 5, 5, 4, 4, 4, 9, 1, 3]
    modes = calculate_mode(scores)
    print('The mode(s) of the list of numbers are:')
    for mode in modes:
        print(mode)

The mode(s) of the list of numbers are:
5
4


# 4.  Creating a Frequency Table

Let’s consider the list of test scores again: 7, 8, 9, 2, 10, 9, 9, 9, 9, 4, 5, 6, 1, 5,
6, 7, 8, 6, 1, and 10. **The frequency table** for this list is shown in the following table.
For each number, we list the number of times it occurs in the second column.


|Score|Frequency|
| --- | --- | 
|1 |2|
|2| 1|
|4| 1|
|5| 2|
|6| 3|
|7|2|
|8|2|
|9| 5|
|10| 2|


### Q 5. Create a Frequency Table for the list of test scores again: 7, 8, 9, 2, 10, 9, 9, 9, 9, 4, 5, 6, 1, 5, 6, 7, 8, 6, 1, and 10.

In [18]:
''' Frequency table for a list of numbers '''

from collections import Counter

def frequency_table(numbers):
    table = Counter(numbers)
    numbers_freq = table.most_common()
    numbers_freq.sort()
    print('Number\tFrequency')
    for number in numbers_freq:
        print('{0}\t{1}'.format(number[0], number[1]))
if __name__=='__main__':
    scores = [7, 8, 9, 2, 10, 9, 9, 9, 9, 4, 5, 6, 1, 5, 6, 7, 8, 6, 1, 10]
    frequency_table(scores)

Number	Frequency
1	2
2	1
4	1
5	2
6	3
7	2
8	2
9	5
10	2


### Homework:
#### 1.  Find mean, median, mode of the following lists and create the frequency table for each:
1. [100, 200, 34, 56, 230, 57,890]
2. [1,5,9,14,54,789,20002,12,245677,13]
3. [20,30,56,78,29,234,12,2,7]
4. [9,10,34,12,23,34,45,67,78,1,12,14]
5. [1,2,3,4,5,6,7,8,9]

#### 2. The runs scored in a cricket match by 11 players is as follows:

#### 7, 16, 121, 51, 101, 81, 1, 16, 9, 11, 16

#### Find the mean, mode, median of this data and form the frequency table.

#### 3. The weights in kg of 10 students are given below:

#### 39, 43, 36, 38, 46, 51, 33, 44, 44, 43

#### Find the mean, mode, median of this data and form the frequency table.
#### 4. The mean of 8, 11, 6, 14, x and 13 is 66. Find the value of the observation x.

