This notebook will examine:
- Relationship between confidence level and confidence interval;
- Relationship between population proportion and sample proportion under hypothesis test.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import norm
from tqdm import tqdm

In [1]:
import warnings

warnings.simplefilter("ignore", UserWarning)

# Confidence Level vs. Confidence Interval

In [1]:
population = np.random.rand(100000) < 0.83
n_samples = 10
sample_size = 100
samples = np.reshape(np.random.choice(population, size=sample_size * n_samples), (n_samples, -1))
samples.shape

In [1]:
def calculate_p(samples):
    
    return np.sum(samples, axis = 1) / samples.shape[1]

In [1]:
def conf_interval(n, p, z):
    
    width = z * np.sqrt((p * (1-p)) / n)
    interval = (p - width, p + width)
    
    return width, interval

In [1]:
def summary(width, interval):
    
    print(f'Width mean: {np.mean(width)}')
    
    data_dict = {}
    data_dict['lower'] = interval[0]
    data_dict['upper'] = interval[1]
    data_dict['category'] = [f'Sample {i}' for i in range(len(interval[0]))]
    data_dict['mean'] = [i for i in p]
    

    dataset = pd.DataFrame(data_dict)

    for lower,upper,y, mean in zip(dataset['lower'],dataset['upper'],range(len(dataset)), data_dict['mean']):
        plt.plot((lower,upper),(y,y),'ro-',color='orange')
        plt.plot((mean,mean),(y,y),'bo-',color='black')
        plt.plot((0.83, 0.83),(0,10),'ro-',color='red')
        
    plt.yticks(range(len(dataset)),list(dataset['category']))
    
    print("Orange lines indicate confidence interval; black dots indicate sample proportion; and red line indicates population proportion.")
    

## Confidience Level = 0.90
In this case, according to z-table, z = 1.645

In [1]:
p = calculate_p(samples)
width, interval = conf_interval(samples.shape[1], p, 1.645)

In [1]:
summary(width, interval)

## Confidience Level = 0.95
In this case, according to z-table, z = 1.96

In [1]:
p = calculate_p(samples)
width, interval = conf_interval(samples.shape[1], p, 1.96)

In [1]:
summary(width, interval)

## Conclusion

From this section of the study, we see that larger confidence level correlates larger confidence interval width. At least in this study, population proportion is always included in confidence interval. However, this observation might be the result of small sample number, thus an expanded study is needed.

## Expanded Study

In this section, I will increase number of samples (to 10000) and examine the proportion of confidence intervals that include population proportion. 

In [1]:
population = np.random.rand(100000) < 0.83
n_samples = 10000
sample_size = 100
samples = np.reshape(np.random.choice(population, size=sample_size * n_samples), (n_samples, -1))
samples.shape

In [1]:
# 90% Confidence Level
p = calculate_p(samples)
width, interval = conf_interval(samples.shape[1], p, 1.645)
print(f"With 90% confidence level, {np.logical_and(0.83 > interval[0], 0.83 < interval[1]).mean()} of confidence intervals include population proportion.")


# 95% Confidence Level
p = calculate_p(samples)
width, interval = conf_interval(samples.shape[1], p, 1.96)
print(f"With 95% confidence level, {np.logical_and(0.83 > interval[0], 0.83 < interval[1]).mean()} of confidence intervals include population proportion.")

In conclusion, proportion of confidence intervals that include population proportion approximately equals to confidence level.

# Hypothesis Test

In [1]:
population = np.random.rand(100000) < 0.83
n_samples = 10
sample_size = 100
samples = np.reshape(np.random.choice(population, size=sample_size * n_samples), (n_samples, -1))
samples.shape

In [1]:
z = (calculate_p(samples) - 0.83) / np.sqrt(0.83 * (1 - 0.83) / sample_size)

In [1]:
alpha = 0.05
z_score = norm.cdf(z)
P = np.minimum(z_score, 1-z_score)

## Single Tail Test

In [1]:
for ind, (P_sample, z_sample) in enumerate(zip(P, z)):

    if z_sample > 0:
        if P_sample < 0.05:
            print(f'We have convincing evidence of rejecting H0 (p = 0.83) and supporting Ha (p > 0.83) with sample {ind}.')
        else:
            print(f"We have no convincing evidence of rejecting H0 (p = 0.83), thus no supporting evidence for Ha (p > 0.83) with sample {ind}.")
            
    else:
        if P_sample < 0.05:
            print(f'We have convincing evidence of rejecting H0 (p = 0.83) and supporting Ha (p < 0.83) with sample {ind}.')
        else:
            print(f"We have no convincing evidence of rejecting H0 (p = 0.83), thus no supporting evidence for Ha (p < 0.83) with sample {ind}.")

## Two-tail Test

In [1]:
for ind, (P_sample, z_sample) in enumerate(zip(P, z)):
    
    if P_sample * 2 < 0.05:
        print(f'We have convincing evidence of rejecting H0 (p = 0.83) and supporting Ha (p ≠ 0.83) with sample {ind}.')
    else:
        print(f"We have no convincing evidence of rejecting H0 (p = 0.83), thus no supporting evidence for Ha (p ≠ 0.83) with sample {ind}.")

## Summary

In this section of the study, of all single-tail tests performed, only 1 test rejected null hypothesis of p = population proportion. With two-tail test, no test rejected such null hypothesis. Therefore, hypothesis tests in this section reinfoce the population proportion of 83%.  

## Expanded Study

In this section, I will perform hypothesis tests on a larger sample number (50000) to explore relationship between $\alpha$ and number of tests that contradict population proportion of 83%.

In [1]:
def hypothesis_test_exp(alpha):
    n_samples = 50000
    sample_size = 100
    samples = np.random.rand(n_samples, sample_size) < 0.83
    
    z = (calculate_p(samples) - 0.83) / np.sqrt(0.83 * (1 - 0.83) / sample_size)
    
    z_score = norm.cdf(z)
    P = np.minimum(z_score, 1-z_score)
    
    return np.mean(P < alpha), np.mean(P * 2 < alpha)

In [1]:
n_exps = 100

single_tail_all = []
two_tail_all = []


for alpha in tqdm(np.arange(0.01, 0.1, 0.01)):
    single_tail = []
    two_tail = []

    for i in range(n_exps):
        one, two = hypothesis_test_exp(alpha)
        single_tail.append(one)
        two_tail.append(two)
        
    single_tail_all.append(np.mean(single_tail))
    two_tail_all.append(np.mean(two_tail))

In [1]:
plt.plot(np.arange(0.01, 0.1, 0.01), single_tail_all, '.')

In [1]:
plt.plot(np.arange(0.01, 0.1, 0.01), two_tail_all, '.')

In both figures above, x-axis represents $\alpha$, and y-axis represents proportion of tests that reject null hypothesis. The first figure results from single-tail tests, and the second figure results from two-tail tests.

From both figure, we see positive correlations between x and y axis. Results from single-tail tests resemble $y = 2x$, whereas those from two-tail tests resemble $y = x$. However, the correlation is not perfect, with often discrete increments across $\alpha$s.