## Statistical Inference with Confidence Intervals

Throughout week 2, we have explored the concept of confidence intervals, how to calculate them, interpret them, and what confidence really means.  

In this tutorial, we're going to review how to calculate confidence intervals of population proportions and means.

To begin, let's go over some of the material from this week and why confidence intervals are useful tools when deriving insights from data.

### Why Confidence Intervals?

Confidence intervals are a calculated range or boundary around a parameter or a statistic that is supported mathematically with a certain level of confidence.  For example, in the lecture, we estimated, with 95% confidence, that the population proportion of parents with a toddler that use a car seat for all travel with their toddler was somewhere between 82.2% and 87.7%.

This is *__different__* than having a 95% probability that the true population proportion is within our confidence interval.

Essentially, if we were to repeat this process, 95% of our calculated confidence intervals would contain the true proportion.

### How are Confidence Intervals Calculated?

Our equation for calculating confidence intervals is as follows:

$$Best\ Estimate \pm Margin\ of\ Error$$

Where the *Best Estimate* is the **observed population proportion or mean** and the *Margin of Error* is the **t-multiplier**.

The t-multiplier is calculated based on the degrees of freedom and desired confidence level.  For samples with more than 30 observations and a confidence level of 95%, the t-multiplier is 1.96

The equation to create a 95% confidence interval can also be shown as:

$$Population\ Proportion\ or\ Mean\ \pm (t-multiplier *\ Standard\ Error)$$

Lastly, the Standard Error is calculated differenly for population proportion and mean:

$$Standard\ Error \ for\ Population\ Proportion = \sqrt{\frac{Population\ Proportion * (1 - Population\ Proportion)}{Number\ Of\ Observations}}$$

$$Standard\ Error \ for\ Mean = \frac{Standard\ Deviation}{\sqrt{Number\ Of\ Observations}}$$

Let's replicate the car seat example from lecture:

In [1]:
import numpy as np
import statsmodels.api as sm
import pandas as pd

In [6]:
tstar = 1.96
p = .85
n = 659
print("Manual Calculation of CI for proportion")
print("No of samples where people use toddlers =",n*p)
print("No of samples where people don't use toddlers =",n*(1-p))
se=np.sqrt((p*(1-p))/n)
print("Standard Error =",se)
ci=(np.round(p-tstar*se,2),np.round(p+tstar*se,2))
print(f"Confidence interval:- ({ci[0]} , {ci[1]})")
print("="*50)
print("Using StatModels for proportion")
ci=sm.stats.proportion_confint(n*p,n)
l=np.round(ci[0],2)
r=np.round(ci[1],2)
ci=(l,r)
print(f"Confidence interval:- ({ci[0]} , {ci[1]})")

Manual Calculation of CI for proportion
No of samples where people use toddlers = 560.15
No of samples where people don't use toddlers = 98.85000000000001
Standard Error = 0.01390952774409444
Confidence interval:- (0.82 , 0.88)
Using StatModels for proportion
Confidence interval:- (0.82 , 0.88)


In [7]:
df = pd.read_csv("Cartwheeldata.csv")
df.head()

Unnamed: 0,ID,Age,Gender,GenderGroup,Glasses,GlassesGroup,Height,Wingspan,CWDistance,Complete,CompleteGroup,Score
0,1,56,F,1,Y,1,62.0,61.0,79,Y,1,7
1,2,26,F,1,Y,1,62.0,60.0,70,Y,1,8
2,3,33,F,1,Y,1,66.0,64.0,85,Y,1,7
3,4,39,F,1,N,0,64.0,63.0,87,Y,1,10
4,5,27,M,2,N,0,73.0,75.0,72,N,0,4


In [8]:
mean = df["CWDistance"].mean()
sd = df["CWDistance"].std()
n = len(df)
se=sd/np.sqrt(n)
print("Manual Calculation of CI for Mean")
print("Standard Error =",np.round(se,2))
CI=(mean-tstar*se,mean+tstar*se)
l=np.round(CI[0],2)
r=np.round(CI[1],2)
CI=(l,r)
print(f"Confidence interval:- ({CI[0]} , {CI[1]})")
print("="*50)
print("Using StatModels for Mean")
CI=sm.stats.DescrStatsW(df['CWDistance']).zconfint_mean()
#CI=(mean-tstar*se,mean+tstar*se)
l=np.round(CI[0],2)
r=np.round(CI[1],2)
CI=(l,r)
print(f"Confidence interval:- ({CI[0]} , {CI[1]})")

Manual Calculation of CI for Mean
Standard Error = 3.01
Confidence interval:- (76.58 , 88.38)
Using StatModels for Mean
Confidence interval:- (76.58 , 88.38)
