##### Confidence Interval for Population Mean Using SciPy

###### 1. Large Sample or Population Variance Known (using z-score)
When working with large samples(n>=30), we can assume that the sampling distribution of the sample mean is approximately normally distributed. In this case, we can calculate te confidence interval for the mean using the 'stats.norm.interval()' function in SciPy.

In [1]:
import numpy as np
from scipy import stats

In [3]:
# Define a sample dataset(exam grades for a class of 40 students)
np.random.seed(42)
grades = np.random.randint(100, size=40)
print(grades)

[51 92 14 71 60 20 82 86 74 74 87 99 23  2 21 52  1 87 29 37  1 63 59 20
 32 75 57 21 88 48 90 58 41 91 59 79 14 61 61 46]


In [4]:
# Calculate the sample mean
xbar = np.mean(grades)
xbar

53.15

In [5]:
# Calculate standart error of the mean
n = len(grades)             #sample size
sigma = 28                  #population standart deviation (from past experiences)
sem = sigma / np.sqrt(n)    #standart error of the mean
sem

4.427188724235731

In [6]:
# Calculate 95% confidence interval around the mean by using scipy.stats
stats.norm.interval(alpha=0.95, loc=xbar, scale=sem)

(44.47286954773614, 61.82713045226386)

###### 2. Population Variance Unknown (using t-score)
Where the population standart deviation 'σ' is unknown, the sample standart deviation 's' is used. This introduces extra uncertainty, as 's' varies from sample to sample. So, the t distribution is used instead of the normal distribution. In this case, we can calculate the confidence interval for the mean using the 'stats.t.interval()' function in SciPy.

In [7]:
# Define a sample dataset (exam grades for a class of 20 students)
np.random.seed(101)
grades = np.random.randint(100, size = 20)
print(grades)

[95 11 81 70 63 87 75  9 77 40  4 63 40 60 92 64  5 12 93 40]


In [8]:
# Calculate 95% confidence interval around the mean by using scipy.stats
stats.t.interval(alpha=0.95,              #confidence level
                 df=np.size(grades)-1,    #degrees of freedom
                 loc=np.mean(grades),     #sample mean
                 scale=stats.sem(grades)) #standart error of the mean

(39.205382746648176, 68.89461725335181)