# Ch. 7: Hypothesis and Inference
Notes on "Data Science from Scratch" by Joel Grus

In [34]:
import math
import random

import numpy as np

## Statistical Hypothesis Testing
Example hypotheses:
- "this coin is fair"
- "data scientists prefer Python to R"
- "people are more likely to navigate away from the page without ever reading the content if we pop up an irritating interstitial advertisement with a tiny, hard-to-find close button"

Let's define:
- $H_0$: **null hypothesis** that represents the "default" position
- $H_1$: **alternative hypothesis** that we'll compare $H_0$ with

We'll use statistics to decide whether we can reject the null hypothesis ($H_0$) or not.

## Example: Flipping a Coin
"1." **Identify a question or problem**: We have a coin and want to test whether it's fair.
    - $p$: probability that the coin lands on heads
    - $H_0$: the coin **is** fair ($\,p = 0.5$)
    - $H_1$: the coin **is not** fair ($\,p \ne 0.5$)
    
    
"2." **Collect relevant data on the topic**: We flip the coin some number $n$ times and count the number of heads $X$.
    - If we assume that each flip of the coin has only two possible outcomes, that the probability of getting heads is the same for each flip, and that the flips are independent of each other, then we can treat each coin flip as a *Bernoulli trial*.
    - Thus, $X$ is a binomial random variable.
    - Thus, we can approximate $X$ using the normal distribution.

In [14]:
def normal_approximation_to_binomial(n, p):
    """finds mu and sigma corresponding to a Binomial(n, p)"""
    mu = p * n
    sigma = math.sqrt(p * (1 - p) * n)
    return mu, sigma

In [16]:
def normal_cdf(x, mu=0, sigma=1):
    return (1 + math.erf((x - mu) / math.sqrt(2) / sigma)) / 2

# the normal cdf _is_ the probability the variable is below a threshold
normal_probability_below = normal_cdf

# it's above the threshold if it's not below the threshold
def normal_probability_above(lo, mu=0, sigma=1):
    return 1 - normal_cdf(lo, mu, sigma)

# it's between if it's less than hi, but not less than lo
def normal_probability_between(lo, hi, mu=0, sigma=1):
    return normal_cdf(hi, mu, sigma) - normal_cdf(lo, mu, sigma)

# it's outside if it's not between
def normal_probability_outside(lo, hi, mu=0, sigma=1):
    return 1 - normal_probability_between(lo, hi, mu, sigma)

In [21]:
def inverse_normal_cdf(p, mu=0, sigma=1, tolerance=0.00001):
    """find approximate inverse using binary search"""
    
    # if not standard, compute standard and rescale
    if mu != 0 or sigma != 1:
        return mu + sigma * inverse_normal_cdf(p, tolerance=tolerance)
    
    low_z, low_p = -10.0, 0           # normal_cdf(-10) is (very close to) 0
    hi_z, hi_p   =  10.0, 1           # normal_cdf(10) is (very close to) 1
    while hi_z - low_z > tolerance:
        mid_z = (low_z + hi_z) / 2    # consider the midpoint
        mid_p = normal_cdf(mid_z)     # and the cdf's value there
        if mid_p < p:
            # midpoint is still too low, search above it
            low_z, low_p = mid_z, mid_p
        elif mid_p > p:
            # midpoint is still too high, search below it
            hi_z, hi_p = mid_z, mid_p
        else:
            break
            
    return mid_z

def normal_upper_bound(probability, mu=0, sigma=1):
    """returns the z for which P(Z <= z) = probability"""
    return inverse_normal_cdf(probability, mu, sigma)

def normal_lower_bound(probability, mu=0, sigma=1):
    """returns the z for which P(Z >= z) = probability"""
    return inverse_normal_cdf(1 - probability, mu, sigma)

def normal_two_sided_bounds(probability, mu=0, sigma=1):
    """returns the symmetric (about the mean) bounds
    that contain the specified probability"""
    tail_probability = (1 - probability) / 2
    
    # upper bound should have tail_probability above it
    upper_bound = normal_lower_bound(tail_probability, mu, sigma)
    
    # lower bound should have tail_probability below it
    lower_bound = normal_upper_bound(tail_probability, mu, sigma)
    
    return lower_bound, upper_bound

"3." **Analyze the data (perform tests)**

In [24]:
mu_0, sigma_0 = normal_approximation_to_binomial(1000, 0.5)
print("mu_0 =", mu_0)
print("sigma_0 =", sigma_0)

mu_0 = 500.0
sigma_0 = 15.811388300841896


In [23]:
normal_two_sided_bounds(0.95, mu_0, sigma_0)

(469.01026640487555, 530.9897335951244)

In [25]:
# 95% bounds based on assumption p is 0.5
lo, hi = normal_two_sided_bounds(0.95, mu_0, sigma_0)

# actual mu and sigma based on p = 0.55
mu_1, sigma_1 = normal_approximation_to_binomial(1000, 0.55)

# a type 2 error means we fail to reject th enull hypothesis
# which will happen when X is still in our original interval
type_2_probability = normal_probability_between(lo, hi, mu_1, sigma_1)
power = 1 - type_2_probability

power

0.8865480012953671

In [27]:
hi = normal_upper_bound(0.95, mu_0, sigma_0)
hi
# is < 531, since we need more probability in the upper tail

526.0073585242053

In [28]:
type_2_probability = normal_probability_below(hi, mu_1, sigma_1)
power = 1 - type_2_probability
power

0.9363794803307173

The *p-values* way:
- choose bounds based on some probability cutoff
- compute the probability - assuming $H_0$ is true - that we would see a vlue at least as extreme as the one actually observed

In [31]:
def two_sided_p_value(x, mu=0, sigma=1):
    if x >= mu:
        # if x is greater than the mean, the tail is what's greater than x
        return 2 * normal_probability_above(x, mu, sigma)
    else:
        # if x is less than the mean, the tail is what's less than x
        return 2 * normal_probability_below(x, mu, sigma)
    
two_sided_p_value(529.5, mu_0, sigma_0)

0.06207721579598857

**Note**: A *continuity correction* was used.

Let's do a simulation to check that this is a sensible estimate:

In [35]:
extreme_value_count = 0
for _ in range(100000):
    num_heads = sum(1 if random.random() < 0.5 else 0   # count number of heads
                    for _ in range(1000))               # in 1000 flips
    
    if num_heads >= 530 or num_heads <= 470:            # and count how often
        extreme_value_count += 1                        # the number is 'extreme'

print(extreme_value_count / 100000)   

0.06133


"4." **Form a conclusion**: Should we reject the hypothesis that the coin is fair (that $p = 0.5$)?
    - Don't reject the null hypothesis
    - Reject the null
    - Don't reject the null
    - Reject the null

In [36]:
two_sided_p_value(531.5, mu_0, sigma_0)

0.046345287837786575

In [37]:
upper_p_value = normal_probability_above
lower_p_value = normal_probability_below

upper_p_value(524.5, mu_0, sigma_0)

0.06062885772582083

In [38]:
upper_p_value(526.5, mu_0, sigma_0)

0.04686839508859242

## Confidence Intervals
We formed a hypothesis about the value of the head probability $p$, which is a parameter of the unknown "heads" distribution, and tested it. Now we construct a *confidence interval* around the observed value of the parameter.

**Example**: If we observe 525 heads out of 1000 coin flips, we can estimate $p = 0.525$.*How confident can we be of this estimate?*

In [41]:
p_hat = 525 / 1000
mu = p_hat
sigma = math.sqrt(p_hat * (1 - p_hat) / 1000)
sigma

0.015791611697353755

"This is not entirely justified, but people seem to do it anyway."

**Note**: If you were to repeat the experiment many times, 95% of the time the "true" parameter (which is the same every time) would lie within th eobserved confidence interval (which might be different every time).

In [42]:
p_hat = 540 / 1000
mu = p_hat
sigma = math.sqrt(p_hat * (1 - p_hat) / 1000)
normal_two_sided_bounds(0.95, mu, sigma)

(0.5091095927295919, 0.5708904072704082)

## P-hacking

In [10]:
def run_experiment():
    """flip a fair coin 1000 times, True = heads, False = tails"""
    return [np.random.random() < 0.5 for _ in range(1000)]

def reject_fairness(experiment):
    """using the 5% significance levels"""
    num_heads = len([flip for flip in experiment if flip])
    return num_heads < 469 or num_heads > 531

np.random.seed(0)
experiments = [run_experiment() for _ in range(1000)]
num_rejections = len([experiment
                     for experiment in experiments
                     if reject_fairness(experiment)])

print(num_rejections)

48


## Example: Running an A/B Test

$$
\begin{align}
N_A &= \textrm{number of people who see ad A}\\
n_A &= \textrm{number of people who click on ad A}\\
p_A &= \textrm{probability that someone clicks ad A}
\end{align}
$$

**Assumptions**:
- $N_A$ is large $\rightarrow n_A / N_A$ is approximately a normal random variable with mean $p_A$ and standard deviation $\sigma_A = \sqrt{p_A(1 - p_A) / N_A}$
- normals are independent $\rightarrow$ their difference should also be normal with mean $p_B - p_A$ and standard deviation $\sqrt{\sigma_A^2 + \sigma_B^2}$

In [11]:
def estimated_parameters(N, n):
    p = n / N
    sigma = math.sqrt(p * (1 - p) / N)
    return p, sigma

In [None]:
def a_b_test_statistic(N_A, n_A, N_b, n_b):
    p_A, sigma_A = estimated_parameters(N_A, n_A)
    p_B, sigma_B = estimated_parameters(N_B, n_B)
    return (p_B - p_A) / math.sqrt(sigma_A ** 2 + sigma_B ** 2)

In [None]:
z = a_b_test_statistic(1000, 200, 1000, 180)
two_sided_p_value(z)

In [None]:
z = a_b_test_statistic(1000, 200, 1000, 150)
two_sided_p_value(z)

## Bayesian Inference
Until now we made probability statements about our *tests*.  
Now we will make probability statements about the *parameters* themselves and not about the *tests*.

In [13]:
def B(alpha, beta):
    """a normalizing constant so that the total probability is 1"""
    return math.gamma(alpha) * math.gamma(beta) / math.gamma(alpha + beta)

def beta_pdf(x, alpha, beta):
    if x < 0 or x > 1:        # no weight outside of [0, 1]
        return 0
    return x ** (alpha - 1) * (1 - x) ** (beta - 1) / B(alpha, beta)

## For Further Exploration
- Free Intro Stats Textbooks:
    - [OpenIntro Statistics](https://www.openintro.org/stat/textbook.php?stat_book=os)
    - [OpenStax Introductory Statistics](https://openstax.org/details/introductory-statistics)
- [Data Analysis and Statistical Inference](https://www.coursera.org/course/statistics) on Coursera
- Maybe there are better books these days, but this is a classic reference: https://www.amazon.com/Statistical-Inference-George-Casella/dp/0534243126