In [1]:
import sys
sys.path.append("./git/")

# Introduction

As scientists, among whom *data* scientists are a component sub-group, we are generally trying to answer some question which we have some idea or intuition about: we are trying to test a hypothesis.

Q: How does the language of scientific experiment translate to the realm of statistics?

A: Through the concept of **Hypothesis Testing**.

## Flipping a coin

Imagine we are tossing a coin, and we want to test whether or not the coin is fair.

We'll assume that the coin has some probability `p` of landing heads.

Our *null hypothesis* is that the coin is fair and `p = 0.5`.

We will test this against the alternative hypothesis that `p != 0.5`.

Specifcally, we will toss a coin `n` times and count the number of heads, `X`. Each coin flip is a Bernoulli trial, which means that `X` is a Binomial(n,p) random variable, which we can approxiate with the normal distribution.

In [1]:
from typing import Tuple
import math

def normal_approximation_to_binomial(n: int, 
                                     p: float) -> Tuple[float,float]:
    """Returns mu and sigma corresponding to a Binomial(n,p)"""
    mu = n * p
    sigma = math.sqrt(p * (1 - p) * n)
    return mu, sigma

Whenever a random variable follows a normal distribution, we can use `normal_cdf` to figure out the probability that its realised value lies within or outside a particular interval:

In [3]:
from scratch.probability import normal_cdf

<Figure size 432x288 with 0 Axes>

In [5]:
normal_probability_below = normal_cdf

def normal_probability_above(lo: float,
                             mu: float = 0,
                             sigma: float = 1) -> float:
    return 1 - normal_cdf(lo,mu,sigma)

def normal_probability_between(lo: float,
                               hi: float,
                               mu: float = 0,
                               sigma: float = 1) -> float:
    return normal_cdf(hi,mu,sigma) - normal_cdf(lo,mu,sigma)

def normal_probability_outside(lo: float,
                               hi: float,
                               mu: float,
                               sigma: float = 1) -> float:
    return 1 - normal_probability_between(lo,hi,mu,sigma)

# p-values