# 07 - Hypothesis and Inference

Code from chapter 7 of the book, _Data Science from Scratch_, 2nd edition.

In [None]:
import dsfs as scratch

## Statistical Hypothesis Testing

Often, as data scientists, we'll want to test whether a certain hypothesis is likely to be true. For our purposes, hypotheses are assertions like:

- "This coin is fair"
- "Data scientists prefer Python to R"

and so on, that can be translated into statistics about data.

Under various assumptions, these statistics can be thought of as observations of random variables from known distributions, This view allows us to make statements about how likely those assumptions are to hold.

In the classical (frequentist?) setup, we have a _null hypothesis_, $H_{0}$, that represents a default position, and some alternative hypothesis, $H_{1}$, with which we'd like to compare it. We use statistics to decide whether we can reject $H_{0}$ as false or not.

## Example: Flipping a Coin

Imagine we have a coin and want to test whether its fair. We'll make the assumption that the coin has some probability, $p$, of landing heads. Consequently, our null hypothesis is that the coin **is** fair; that is, $p = 0.5$. We'll test this hypothesis against the alternative hypothesis, $p \neq 0.5$.

In particular, our test involves flipping the coin some number, $n$, times and counting the number of heads, $X$. Each coin flip is a Bernoulli trial; consequently, $X \sim \mathcal{Bin}(n, p)$.

Given a "large" number of trials, we can approximate $\mathcal{Bin}(n, p)$ by a normal distribution.

In [None]:
# If n == 100
scratch.inference.normal_approximation_to_binomial(100, 0.5)

Whenever a random variable follows a normal distribution, we can use `normal_cdf` to calculate the amount of probability that values lie within our outside different intervals.

In [None]:
# P(Z < 48)
scratch.inference.normal_probability_below(48, 50, 5)

In [None]:
# P(Z > 48)
scratch.inference.normal_probability_above(48, 50, 5)

In [None]:
# P(Z in [46, 48])
scratch.inference.normal_probability_between(46, 48, 50, 5)

In [None]:
# P(Z not in [46, 48])
scratch.inference.normal_probability_outside(46, 48, 50, 5)

We can also do the "inverse" operation: calculate the non-tail region or the (symmetric) interval around the mean containing a specified "amount" of probability.

In [None]:
# Calculate the one-sided tail region containing the "left-hand"
# (or lower) 20% of the probability. That is, calculate the z for which
# P(Z <= z) == 20%.
scratch.inference.normal_upper_bound(0.20, 50, 5)

In [None]:
# Calculate the one-sided tail region containing the "right-hand"
# (or upper) 20% of the probability. That is, calculate the z for which
# P(Z >= z) == 20%.
scratch.inference.normal_lower_bound(0.20, 50, 5)

In [None]:
# Calculate the two-sided tail (symmetric about the mean) containing 60%
# of the probability. That is, calculate the range, [lo, hi] such that
# P(lo < Z < hi) == 60%.
scratch.inference.normal_two_sided_bounds(0.60, 50, 5)

Consider another test: we flip the coin 1000 times. If the coin is fair,

In [None]:
# Binom(1000, 0,5) ~= N(500, 15.8)
normal_parameters_0 = scratch.inference.normal_approximation_to_binomial(1000, 0.5)
normal_parameters_0

Before performing our analysis, we need to make a decision about _significance_ -- how willing we are to make a _type 1_ error (a "false positive"), and reject the null hypothesis, $H_{0}$, even though it is true. For historical reasons, we often choose a significance of 5% (or 1%).

In [None]:
# If we choose 5%, we are choosing to accept any hypothesis within 95%
# of the mean of our normal approximation.
lo, hi = scratch.inference.normal_two_sided_bounds(0.95, normal_parameters_0.mu, normal_parameters_0.sigma)
lo, hi

In [None]:
# Converting to integral values
round(lo), round(hi)

If we assume our null hypothesis, $H_{0}: p == 0.5$, is true, than the probability that we observe a value outside the range [469, 531] is only 5%.

To put it another way, if $H_{0}$ is true and we repeat our 1000 tosses 20 times, we expect to observe a value in [469, 531] 19 times, and we only expect to observe a value outside [469, 531] 1 time out of 20 repetitions of our 1000 toss test.

In addition to significance, we are often interested in the _power_ of a test. The _power_ is the probability of **not** making a _type 2 error_ (a false negative). (That is, failing to reject the null hypothesis, $H_{0}$, when it is actually **false**.)

In order to measure the power, we must define what "$H_{0}$ is false" _means_. Specifically, in our 1000-toss test, knowing merely that $p \neq 0.5$ does not give us much information about the distribution of $X$.

Let's examine the situation in which $p$ is actually 0.55.

In [None]:
# The bounds based on p == 0.5
lo, hi = scratch.inference.normal_two_sided_bounds(0.95, normal_parameters_0.mu, normal_parameters_0.sigma)

In [None]:
# Calculate the normal parameters assuming p == 0.55
normal_parameters_1 = scratch.inference.normal_approximation_to_binomial(1000, 0.55)

In [None]:
# A type 2 error occurs when we **fail** to reject the null hypothesis.
# That is, we fail to reject the null hypothesis when X is in our
# **original** interval.
type_2_probability = scratch.inference.normal_probability_between(lo, hi,
                                                                  normal_parameters_1.mu,
                                                                  normal_parameters_1.sigma)
type_2_probability

In [None]:
power = 1 - type_2_probability
power

Imagine instead that our null hypothesis was that the coin was not biased toward heads. In symbols,

$$
    p \leq 0.5
$$

Because our hypothesis is not a point, the limit of a central region, our hypothesis is no longer about a region about the mean but about a region that includes one side. Consequently, in our analysis, we want to consider a _one-sided test_ that rejects the null hypothesis when $X$ is much larger than 500 (1/2 of 1000 tosses) but **not** when $X$ is smaller than 500.

Thus, we want to "design" a 5% significance test that only uses a single tail. In this specific case, we want to examine `normal_probability_below` to find the point, the cutoff, below which 95% of the probability lies.

In [None]:
# Calculate the number of heads in 1000 tosses (each with probability of
# heads of 0.5) such that 95% of all 1000-toss "repetitions" would have
# **fewer** heads.
hi = scratch.inference.normal_upper_bound(0.95, normal_parameters_0.mu, normal_parameters_0.sigma)
hi

Note that the "hi" value is now 526. This value is less than the previous high value (531) because this particular test is a one-sided test.

In [None]:
type_2_probability = scratch.inference.normal_probability_below(hi, normal_parameters_1.mu, normal_parameters_1.sigma)
type_2_probability

In [None]:
power = 1 - type_2_probability
power

This test is a more powerful test, since it no longer rejects $H_0$ when $X < 469$. (This situation is very unlikely to happen if $H_1$ is true.) Instead this test rejects $H_0$ when $X$ is in [526, 531]. (This situation is somewhat likely if $H_1$ is true.)