**Point Estimation with Python - Examples**



**Recap**

From our suggested reading we can summarize point estimation to be estimating a population parameter by using sample data.

For instance, if we were tasked with calculating the average height of the adult population in Kenya, we would take a survey of all the registered adults in Kenya. But since the adult population is too large, it would take alot of time and resources to take a survey of all the adult population. An alternative to this, we could take survey of a random sample from the adult population and then calculate the average height from this. The average height we get from the sample would be our point estimate for the average height of the population.


Let's use the above example to explain the concept of point estimation using python.

We are going to make two assumption:


1.   The population follows a normal distribution
2.   For us to be able to understand point estimators better, we are going to abitrarily assign a population mean so that we can see how accurate a point estimator is.



In [1]:
# Import the neccessary libarries
import numpy as np
import pandas as pd
# scipy is python library will help us use statistical formulas in our code
import scipy.stats as stats
import random


In [2]:
# We use the "seed" value of 10. Follow this link to learn more about numpy seed. https://www.sharpsightlabs.com/blog/numpy-random-seed/
np.random.seed(10)
#Use the stats module to generate random variables for our population. Here we specify our 
#abitrary mean height and a standard deviation, which are loc and scale respectively
population_height = stats.norm.rvs(loc=165, scale=1, size=1500000)
print ( population_height )  

#Calculate the population mean
population_height.mean()

[166.3315865  165.71527897 163.45459971 ... 165.49730283 164.88786175
 164.21243322]


164.99887373021798

In [3]:
np.random.seed(6)
# Sample 1000 values
sample_height = np.random.choice(a= population_height,
                               size=500000)            

# Show sample mean
print ( sample_height.mean() )                         

# calculate how much the estimated mean differs from the main mean
population_height.mean() - sample_height.mean()

164.999551452145


-0.0006777219270190926

From this example we can see that based on a sample of 500,000 adults our estimator underestimates the true mean by 0.0006. We can conclude that we can get a fairly accurate estimate of a large population from a fairly small subset.

Challenge

Suppose you are tasked to find the average age of registered voters in Kenya. Assume that the population follows a normal distribution. Find the best estimator for the population parameter

In [0]:
# Your code goes here