### Confidence Intervals

$$ \bar X \overset{\tiny approx.}{\sim} Normal\left(\mu_{\bar X}: E[X^{(i)}], \sigma^2_{\bar X}: \frac{Var[X^{(i)}]}{n}\right) $$

$$ \text{the "Z-score"}\quad\frac{\bar X - \mu_{\bar X}}{\sigma_{\bar X}} \overset{\tiny approx.}{\sim}Normal\left(0, 1\right)\quad\text{is distributed as a "standard normal" distribution} $$

$$Pr(-1.96 \leq Z \leq 1.96) = 0.95$$

$$Pr\left(-1.96 \leq \frac{\bar X - \mu_{\bar X}}{\sigma_{\bar X}} \leq 1.96\right) = 0.95$$

$$Pr\left(\bar X  +1.96\sigma_{\bar X} \geq \mu_{\bar X} \geq \bar X - 1.96\sigma_{\bar X}\right) = 0.95$$

$$Pr\left(\bar X  +.85\sigma_{\bar X} \geq \mu_{\bar X} \geq \bar X - .85\sigma_{\bar X}\right) = 0.60$$

In [1]:
from scipy import stats
1-2*stats.norm.cdf(-.85, loc=0, scale=1)

0.60467491375461524

- This is called the pivot
- This is still a probability statement about the random variable $Z$ (or we can think of it as $\bar X$)
- *Parameters* for frequentists are fixed constants
    - They don't have distributions
    - They're not random variables
    - They don't have uncertainty -- they only actualize a SINGLE value (they are scalars)
    
- ERGO... (IF you're a frequentist) then you NEVER (NEVER NEVER EVER) make a probabiltiy statement about a parameter (because that would be incredibly ignorant)

- define distribution
    - distribution (normal)
    - parameters (0 , 1)

- take samples (1000)

- choose confidence level (1%, 99%)
- Hypothetical Repititions (20, 1000)

- sample size



In [2]:
stats.norm.ppf(.025,0,1)

-1.9599639845400545

In [3]:
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
output_notebook()

from bokeh.models import ColumnDataSource

from bokeh.layouts import widgetbox
from bokeh.models.widgets import Dropdown, Slider

from bokeh.models.ranges import Range

from ipywidgets import interact
from bokeh.io import push_notebook

import numpy as np
from scipy import stats

In [4]:
reps = 500
n = 30
data = stats.norm.rvs(np.zeros([reps, n]))

In [5]:
def get_CI(dat, confidence_level):
    CI_multiplier=abs(stats.norm.ppf((1-confidence_level)/2))
    CIs = []
    for samp in dat:
        samp_xbar = np.mean(samp)
        samp_std = np.std(samp, ddof=1)
        CIs.append((samp_xbar-CI_multiplier*samp_std/np.sqrt(samp.size), samp_xbar+CI_multiplier*samp_std/np.sqrt(samp.size)))
        
    return(CIs)

In [6]:
xlim = ColumnDataSource(data = dict(xlim=[20]))

p = figure(title="first", plot_height=300, plot_width=600, y_range=(-1,1), x_range=(0,xlim.data['xlim'][0]))
CIs=get_CI(data, .95)
colr = ['red' if (0 < ci[0]) or (0>ci[1]) else 'blue' for ci in CIs]
p.line(x=[0,reps], y=2*[0], color='black', line_width=1)
r = [p.line(x=2*[i], y=CIs[i], color=colr[i], line_width=1) for i in range(reps)]

In [7]:
def update(reps=50, conf_level=.95):
    
#    xlim = ColumnDataSource(data = dict(xlim=[reps]))
    p.x_range.end = reps
    CIs = get_CI(data, conf_level)
    colr = ['red' if (0 < ci[0]) or (0>ci[1]) else 'blue' for ci in CIs]
    for i,l in enumerate(r):
        l.data_source.data['y'] = CIs[i]
        l.glyph.line_color = colr[i]

    push_notebook()        

- "I have X% confidence that this interval caputures the actual population parameter value"
    - 

- "I have X% confidence that the population mean is in this interval" 
    - confidence language about anything is okay

- "I have X% chance that this interval caputures the actual population parameter value"
    - Confidence ARE random variables; therefore, you CAN make probabilistic statements about confidence intervals 

- "Does not mean there's an X% chance that the population parameter is in this interval"

    - this says the population parameter is a random variable -- NO IT IS NOT
    
    

In [8]:
show(p, notebook_handle = True)

In [9]:
interact(update, reps=(50,500,10), conf_level=(.025,.98,.025))#[.99, .95, .9, .75, .5])

<function __main__.update>

### Next Steps
- Add Population Mean Student-t
- Confidence Interval for Population Proportion (large sample)
- Determine Sample Size
- Confidence Interval for population variance.
- Confidence Interval as a hypothesis test.
- Confidence interval for difference of 2 population means. like comparing website changes. 