# Confidence interval through bootstrapping and Student's t assumption

In this program, we shall verify how robust is the confidence interval estimated from non-normal sample (exponentially distributed) through two methods:

(1) bootstrapping <br>
(2) Assuming a t distributed sample (kwoning that they are not)

## Libraries

In [173]:
import numpy as np
from scipy import stats
from matplotlib import pyplot as plt

## Confidence interval by bootstrapping

Defining some functions

In [174]:
# Empirical Distribution Function
def EDF(x, sample):
    ret = 0
    
    for xi in sample:
        if xi <= x: ret += 1
            
    return ret / sample.size

def sample_quantile(p, sample):
    eps = (max(sample) - min(sample)) / 1000
    
    x = min(sample)
    while EDF(x, sample) < p:
        x += eps
        
    return x

def boot_conf_inter(sample, alpha = 0.05, n = 1000):
    mean = np.mean(sample)
    boot_mean = np.full(n, np.nan)
    
    for i in range(boot_mean.size):
        boot_mean[i] = np.mean(np.random.choice(sample, sample.size))
        
    return (2*mean - sample_quantile(1 - alpha/2, boot_mean), 2*mean - sample_quantile(alpha / 2, sample))

Perfoming the sampling <code>n_total</code> and testing if the populational mean is within the estimated confidence interval

In [183]:
n_in = 0 # Number of times that the population mean was inside the confidence interval
n_total = 100 # Number of experiments

for i in range(n_total):
    sample = stats.expon.rvs(size = 30)
    pop_mean = stats.expon.mean()
    conf_inter = boot_conf_inter(sample, n = 100)
    
    if (pop_mean >= conf_inter[0]) and (pop_mean <= conf_inter[1]): n_in += 1
        
print('The population mean was between the estimated confidence interval {:.2f}%'.format(100 * n_in / n_total))

The population mean was between the estimated confidence interval 99.00%


## Confidence interval by Student's t assumption

Let us assuming the sample normally distributed, even though we know they are not, and then let us evaluate the confidence interval using a Student's t distribution. 

In [182]:
n_in = 0 # Number of times that the population mean was inside the confidence interval
n_total = 1000 # Number of experiments

for i in range(n_total):
    sample = stats.expon.rvs(size = 30)
    pop_mean = stats.expon.mean()
    mean = np.mean(sample)
    se = np.std(sample) / np.sqrt(sample.size)
    alpha = 0.05
    z = stats.t.ppf(1 - alpha / 2, sample.size - 1)
    conf_inter = (mean - z*se, mean + z*se)
    
    if (pop_mean >= conf_inter[0]) and (pop_mean <= conf_inter[1]): n_in += 1
        
print('The population mean was between the estimated confidence interval {:.2f}%'.format(100 * n_in / n_total))

The population mean was between the estimated confidence interval 92.00%


## Conclusion

The bootstrapping approach always obtain a estimated confidence interval which includes the populational mean at least 95% of the time, whilst the Student's t approach rarely reproduces the desired result. If we run 