In [2]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Central Tendency

In [5]:
n = np.random.randint(7,10,20)
n

array([9, 7, 7, 8, 9, 7, 9, 8, 7, 8, 9, 8, 9, 8, 9, 7, 8, 8, 7, 8])

### Mean
Mean is the sum of the value of each observation in a dataset divided by the number of observations

In [6]:
np.mean(n)

8.0

### Median
Median is the middle value in distribution when the values arranged in ascending or descending order

In [7]:
np.median(n)

8.0

### Mode
Mode is most commonly occuring value in a distribution

In [10]:
from statistics import mode

mode(n)

8

## Population and Sample
Let us first introduce some terminology and related notations used in this book.

**The units on which we measure data—such as persons, cars, animals, or plants—
are called observations. These units/observations are represented by the Greek symbol ω.** 

**The collection of all units is called population and is represented by Ω.**    
When we refer to ω ∈ Ω, we mean a single unit out of all units, e.g. one person out of all persons of interest. 

**If we consider a selection of observations ω1, ω2,..., ωn, then
these observations are called sample. A sample is always a subset of the population,
{ω1, ω2,..., ωn} ⊆ Ω**

In [11]:
population = np.random.randint(10, 20, 100)
population

array([12, 15, 18, 11, 15, 14, 12, 16, 18, 18, 16, 14, 10, 12, 14, 17, 18,
       15, 11, 10, 11, 19, 12, 16, 17, 17, 10, 14, 17, 17, 13, 12, 15, 19,
       19, 16, 19, 11, 19, 11, 18, 17, 15, 18, 17, 17, 14, 15, 19, 12, 12,
       17, 14, 15, 18, 16, 12, 15, 16, 12, 16, 19, 15, 10, 15, 16, 12, 13,
       17, 10, 15, 15, 17, 16, 10, 14, 18, 14, 18, 18, 10, 10, 10, 13, 17,
       17, 16, 17, 14, 11, 14, 16, 14, 11, 12, 19, 10, 17, 15, 19])

In [12]:
np.mean(population)

14.79

In [13]:
np.median(population)

15.0

In [14]:
from statistics import mode
mode(population)

17

In [16]:
sample = np.random.choice(population, 20) # sample is a subset of population
sample

array([11, 14, 11, 10, 18, 12, 16, 19, 14, 19, 17, 15, 12, 19, 10, 10, 13,
       17, 15, 15])

In [17]:
np.mean(sample)

14.35

In [18]:
np.median(sample)

14.5

In [19]:
mode(sample)

10

As in population and our sample, we can observe that there are similarities between our central tendencies. Let's collect more samples to strengthen our knowledge.

In [28]:
sample1 = np.random.choice(population, 15)
sample2 = np.random.choice(population, 15)
sample3 = np.random.choice(population, 15)
sample4 = np.random.choice(population, 15)

In [29]:
all_samples = [sample, sample1, sample2, sample3, sample4]
sample_mean = []

for i in all_samples:
    sample_mean.append(np.mean(i))
    
sample_mean

[14.35, 14.066666666666666, 13.466666666666667, 14.0, 14.333333333333334]

In [30]:
np.mean(sample_mean)

14.043333333333333

In [31]:
np.mean(population)

14.79