
# Assignment 10 

**Point Estimate and Interval Estimate (Confidence Interval)**

A random survey of enrollment at 35 community colleges across the United States yielded the following figures:

6,414; 1,550; 2,109; 9,350; 21,828; 4,300; 5,944; 5,722; 2,825; 2,044;

5,481; 5,200; 5,853; 2,750; 10,012; 6,357; 27,000; 9,414; 7,681; 3,200;

17,500; 9,200; 7,380; 18,314; 6,557; 13,713; 17,768; 7,493; 2,771; 2,861;

1,263; 7,285; 28,165; 5,080; 11,622

Perform point estimate and interval estimate with 95% confidence level using t-distribution.

Since we don't know the population variance, we use t-distribiution instead of normal distribution.


In [24]:
# Import Python packages

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
from scipy.stats import t

## Step 0 - Data Preprocessing

Process the raw data to make a list of integers. In order to calculate descriptive statistic, Python needs to work with a list of numbers.

In [3]:
# make each line of numbers a string object and then concatenate them together 
# The end result is one single string containing 35 numbers separated by ";"

data_1 = "6,414; 1,550; 2,109; 9,350; 21,828; 4,300; 5,944; 5,722; 2,825; 2,044;" 
data_2 = "5,481; 5,200; 5,853; 2,750; 10,012; 6,357; 27,000; 9,414; 7,681; 3,200; "
data_3 = "17,500; 9,200; 7,380; 18,314; 6,557; 13,713; 17,768; 7,493; 2,771; 2,861; "
data_4 = "1,263; 7,285; 28,165; 5,080; 11,622"
data = data_1 + data_2 + data_3 + data_4
print(data)

6,414; 1,550; 2,109; 9,350; 21,828; 4,300; 5,944; 5,722; 2,825; 2,044;5,481; 5,200; 5,853; 2,750; 10,012; 6,357; 27,000; 9,414; 7,681; 3,200; 17,500; 9,200; 7,380; 18,314; 6,557; 13,713; 17,768; 7,493; 2,771; 2,861; 1,263; 7,285; 28,165; 5,080; 11,622


In [4]:
# Convert the single string to a list of strings using split() function
# Make sure to specify a delimter or separator

x = data.split(";")
print(x)

['6,414', ' 1,550', ' 2,109', ' 9,350', ' 21,828', ' 4,300', ' 5,944', ' 5,722', ' 2,825', ' 2,044', '5,481', ' 5,200', ' 5,853', ' 2,750', ' 10,012', ' 6,357', ' 27,000', ' 9,414', ' 7,681', ' 3,200', ' 17,500', ' 9,200', ' 7,380', ' 18,314', ' 6,557', ' 13,713', ' 17,768', ' 7,493', ' 2,771', ' 2,861', ' 1,263', ' 7,285', ' 28,165', ' 5,080', ' 11,622']


In [5]:
e = [int(i.replace(",", "")) for i in x]
print(e)

[6414, 1550, 2109, 9350, 21828, 4300, 5944, 5722, 2825, 2044, 5481, 5200, 5853, 2750, 10012, 6357, 27000, 9414, 7681, 3200, 17500, 9200, 7380, 18314, 6557, 13713, 17768, 7493, 2771, 2861, 1263, 7285, 28165, 5080, 11622]


In [6]:
# Create a list of integers from the list of strings using List Comprehension or a function
# Make sure to remove the "," first and then convert the strings to integers

e = []
for i in x:
    y = i.replace(",", "")
    y = int(y)
    e.append(y)
    
print(e)

[6414, 1550, 2109, 9350, 21828, 4300, 5944, 5722, 2825, 2044, 5481, 5200, 5853, 2750, 10012, 6357, 27000, 9414, 7681, 3200, 17500, 9200, 7380, 18314, 6557, 13713, 17768, 7493, 2771, 2861, 1263, 7285, 28165, 5080, 11622]


## Step 1 - Calculate and Display the Sample Size and Sample Mean

In [7]:
# Calculate and display the sample size

print('The size of the sample =',len(e))

The size of the sample = 35


In [40]:
# Calculate and display the sample mean

sample_mean = round(np.mean(e))

print('Sample mean =', sample_mean)

Sample mean = 8629.0


The point estimate of the mean enrollment of US community colleges is 8629.

## Step 2 - Calculate and Display the Sample Standard Deviation & Sample Standard Error

Sample Standard Deviation $s=\displaystyle \sqrt{\frac{\sum_{i=1}^n (x_i-\bar{x})^2}{n-1}}$

Sample Standard Error $\displaystyle =\frac{s}{\sqrt{n}}$ 

Note: The default **Delta Degree of Freedom (DDOF)** for Numpy's std function is 0 which is applicable to populate data. For sample data, we need to specify **ddof=1**.

For the enrollment data, we round up the statistics to be the full integers (no decimal points).

In [20]:
# Calculate and display the sample standard deviation using Numpy's std function.

StD = round(np.std(e, ddof=1))

print('Sample Standard Deviation =',StD)

Sample Standard Deviation = 6944.0


In [23]:
# Calculate and display the sample standard error

SSE = round(StD/math.sqrt(len(e)))

print('Sample Standard Error =', SSE)

Sample Standard Error = 1174.0


## Step 3 - Calculate t Critical Value using t-Distribution

$\alpha=1-\mbox{Confidence Level}=1-0.95=0.05$

$\displaystyle \frac{\alpha}{2}=0.025$

$n(\mbox{sample size})=35$

$df(\mbox{degree of freedom})=n-1=35-1=34$

We will use Python scipy.stats t-distribution's PPF (Percentage Point Function) to calculate t critical value $t_{0.025,\,34}$.

In [35]:
# Calculate and display the t critical value using scipy.stats.t package ppf function

tcv = round(t.ppf(0.025,34),2)

print('t critical value =',tcv)

t critical value = -2.03


## Step 4 - Calculate the Margin of Error

Margin of Error = t-Statistics * Sample Standard Error =$\displaystyle t_{\alpha/2,\,n-1}\Big(\frac{s}{\sqrt{n}}\Big)$

In [39]:
# Calculate and display the margin of error

margin_error = round(abs(tcv*SSE))

print('Margin of Error =',margin_error)

Margin of Error = 2383.0


## Step 5 - Calculate Lower and Upper Limit of the Confidence Interval

Lower Limit = Sample Mean - Margin of Error

Upper Limit = Sample Mean + Margin of Error

In [42]:
# Calculate and display the lower limit

lower_limit = sample_mean - margin_error

print('Lower Limit =',lower_limit)

Lower Limit = 6246.0


In [43]:
# Calculate and display the upper limit

upper_limit = sample_mean + margin_error

print('Upper Limit =',upper_limit)

Upper Limit = 11012.0


## Step 6 - Now We have the 95% Confidence Interval

Confidence Interval = $\displaystyle \bar{x}\pm t_{\alpha/2,\,n-1}\Big(\frac{s}{\sqrt{n}}\Big)$ = Sample_Mean $\pm$ Margin of Error

In [45]:
# Print the 95% confidence interveral in the form (lower limit, upper limit)

confidence_int = (lower_limit,upper_limit)

print('The 95% Confidence Interval =',confidence_int)

The 95% Confidence Interval = (6246.0, 11012.0)
