<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Objectives" data-toc-modified-id="Objectives-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Objectives</a></span></li><li><span><a href="#Another-Statistical-Test" data-toc-modified-id="Another-Statistical-Test-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Another Statistical Test</a></span><ul class="toc-item"><li><span><a href="#A-New-Class:-Non-Parametric-Tests" data-toc-modified-id="A-New-Class:-Non-Parametric-Tests-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>A New Class: Non-Parametric Tests</a></span></li></ul></li><li><span><a href="#The-$\chi^2$-Test" data-toc-modified-id="The-$\chi^2$-Test-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>The $\chi^2$ Test</a></span></li><li><span><a href="#$\chi^2$-Goodness-of-Fit-Test" data-toc-modified-id="$\chi^2$-Goodness-of-Fit-Test-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>$\chi^2$ Goodness-of-Fit Test</a></span><ul class="toc-item"><li><span><a href="#Observations" data-toc-modified-id="Observations-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Observations</a></span></li><li><span><a href="#Expected-Values" data-toc-modified-id="Expected-Values-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Expected Values</a></span></li><li><span><a href="#No-Expected-Frequency-$\lt-5$" data-toc-modified-id="No-Expected-Frequency-$\lt-5$-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>No Expected Frequency $\lt 5$</a></span></li><li><span><a href="#Calculate-$\chi^2$-Statistic" data-toc-modified-id="Calculate-$\chi^2$-Statistic-4.4"><span class="toc-item-num">4.4&nbsp;&nbsp;</span>Calculate $\chi^2$ Statistic</a></span></li><li><span><a href="#Determine-p-value" data-toc-modified-id="Determine-p-value-4.5"><span class="toc-item-num">4.5&nbsp;&nbsp;</span>Determine p-value</a></span></li></ul></li><li><span><a href="#$\chi^2$-Test-for-Independence" data-toc-modified-id="$\chi^2$-Test-for-Independence-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>$\chi^2$ Test for Independence</a></span></li><li><span><a href="#$\chi^2$-Test-of-Homogeneity" data-toc-modified-id="$\chi^2$-Test-of-Homogeneity-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>$\chi^2$ Test of Homogeneity</a></span></li></ul></div>

In [None]:
import pandas as pd
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
%matplotlib inline

# Objectives

# Another Statistical Test

We've seen from hypothesis tests that they generally follow this pattern:


$$ \huge \frac{\text{Observed difference} - \text{Expectation if } H_0 \text{ is true}}{\text{Average Variance}}$$

And we've seen we can use different statistical tests depending on the situation.

## A New Class: Non-Parametric Tests

So far with $z$-tests, $t$-tests, and $F$-tests (ANOVA) we've been using the mean $\mu$ and standard deviation $\sigma$ to address a question. These are all *parametric tests* (use parameters to describe the null hypothesis).

But imagine if we had something like I asked 50 men and 50 women if they preferred pizza (üçï) or pasta (üçù)

|     |  üçï | üçù  |
| --- | --- | --- |
|  ‚ôÄ  | 31  | 19  |
|  ‚ôÇ  | 28  | 22  |

We really couldn't say something about the average favorite food. 

Instead, we tend to talk about proportions or frequencies to describe the data. This is where *non-parametric tests* can come in handy.

# The $\chi^2$ Test

When we talk about categorical variables vs other categorical variables (compared to continuous variables), the $\chi^2$ test is a good fit for our test.

There are a few different $\chi^2$ tests but they all center around the **$\chi^2$ statistic** and the **$\chi^2$ distribution**.

![](https://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Chi-square_distributionPDF.png/640px-Chi-square_distributionPDF.png)

Going back to our pizza vs pasta example, let's imagine we ask 100 individuals about their preference:


|                  |  üçï | üçù  |
| ---------------- | --- | --- |
| **OBSERVATIONS** | 52  | 48  |


It's not necessarily obvious if there is a _statistically_ significant difference in preference.

There are actually different $\chi^2$ hypothesis tests and they have different use cases but all surround observing different categories from different groups.

# $\chi^2$ Goodness-of-Fit Test

If we are looking to see if some observed proportion _matches_ an expected proportion in relation to one variable, we do a **$\chi^2$ goodness-of-fit test**.

The steps follow like this:

1. Start with your _observation_ frequencies/proportions for each group
2. State what your _expectations_ were for each group
3. Check your assumptions (no expected frequency $\lt 5$)
4. Calculate the $\chi^2$ statistic
5. Determine your p-value via your $\chi^2$ statistic and degrees of freedom using the $\chi^2$ distribution

Let's try out an example as we work out how this test works.

## Observations

Suppose a company has hired us on. The company has been running a website in the U.S. but is now expanding it to other countries, namely the U.K. They would like to know if the U.K. users are "typical" in comparison to U.S. users.

They tell us that at the beginning of signing up with the site, the users can choose one of four types of profiles: **A**, **B**, **C**, & **D**.

There was an experiment run by the company where $400$ U.K. users were given early access to the platform. Their choice in profiles were the following:

|              |  A  |  B  |  C  |  D  |
| ------------ | --- | --- | --- | --- |
| **UK USERS** | 50  | 100 | 180 | 70  |

## Expected Values

Now to determine if these U.K users are similar to U.S. users, we need to know what profile types  the U.S. users choose.

Suppose we have historical data on U.S. users and know:

- **A** is chosen $15\%$ of the time
- **B** is chosen $20\%$ of the time
- **C** is chosen $45\%$ of the time
- **D** is chosen $20\%$ of the time

Then since we would _expect_ that the $400$ U.K. users would follow the same pattern. Note this assumes the $H_0$ (there is no difference between U.K. & U.S. users). 

Thus we get the following expectations:

|              |  A  |  B  |  C  |  D  |
| ------------ | --- | --- | --- | --- |
| **EXPECTED** | 60  | 80  | 180 | 80  |

To make life easier for us, let's combine this into one table:

|              |  A  |  B  |  C  |  D  |
| ------------ | --- | --- | --- | --- |
| **UK USERS** | 50  | 100 | 180 | 70  |
| **EXPECTED** | 60  | 80  | 180 | 80  |

## No Expected Frequency $\lt 5$

Quickly, we should note that if any of the expected frequency is less than $5$ the $\chi^2$ test can have some issues.

Technically, this is arbitrary (like many of our limits in statistics) but is generally a good rule of thumb.

In this case, we see no expected frequency falls under $5$ so we're good to proceed! üëçüèº

## Calculate $\chi^2$ Statistic

Now we want to determine our test statistic. Recall what we want in a statistic:

$$ \large \frac{\text{Observed difference} - \text{Expectation if } H_0 \text{ is true}}{\text{Average Variance}}$$

Remember, we really want to capture the observed difference from what we'd expect. But if we did this and summed theses differences we'd always get $0$. So instead we square the differences before adding them.

We still need to scale these differences and we naturally use the expectation value for each group.

This gives us the $\chi^2$ statistic:


$$\large \chi^2 = \sum \frac{( Expected_i - Observed_i)^2}{Expected_i}$$

--------------

So back to our example, we'll use our table to organize the values:

|                     |  A  |  B  |  C  |  D  |
| :-----------------: | --- | --- | --- | --- |
| **UK USERS**        | 50  | 100 | 180 | 70  |
| **EXPECTED**        | 60  | 80  | 180 | 80  |
| $\frac{(E-O)^2}{E}$ | 1.67| 5.00| 0.00| 1.25|

This gives $\chi^2 \approx 1.67	+ 5.00 + 0.00 + 1.25 = 7.92$

## Determine p-value

Our last step is to determine the p-value via the $\chi^2$ distribution.

One consideration is the _degrees of freedom_ (think back to our $t$-distribution). But what is the degrees of freedom here?

Well, the **degrees of freedom** is really related to **how many categories/groups** we used (number of categories minus 1: $df = k-1$)

So in this case $df = 3$ and gives this distribution:

In [None]:
degrees_of_freedom = 3
x = np.linspace(
        stats.chi2.ppf(0.000001, degrees_of_freedom),
        stats.chi2.ppf(0.9999, degrees_of_freedom), 
        500
)
with plt.xkcd():    
    f, ax = plt.subplots()
    ax.set_title(f'$\chi^2$ w/ $df={degrees_of_freedom}$')
    ax.plot(x, stats.chi2.pdf(x, degrees_of_freedom), 'r-', lw=5)
    plt.tight_layout()

Well, we also know our $\chi^2$ statistic is $7.92$ so let's plot that too so we can see how much area under the is more extreme than our statistic:

In [None]:
stats.chisquare([50,100,180,70], [60,80,180,80])

In [None]:
chisq_stat = 7.92

with plt.xkcd():    
    f, ax = plt.subplots()
    ax.set_title(f'$\chi^2$ w/ $df={degrees_of_freedom}$')
    ax.plot(x, stats.chi2.pdf(x, degrees_of_freedom), 'r-', lw=5)
    # Chi-square statistic
    ax.axvline(chisq_stat, ls='--', c='b', label=f'$\chi^2={chisq_stat}$')
    ax.legend
    plt.tight_layout()

This looks pretty small, but let's calculate the p-value to be sure:

In [None]:
# Note we subract since we want the area to the right of the statistic
p = 1 - stats.chi2.cdf(chisq_stat, df=degrees_of_freedom)
p

So not bad! For a significance level of $\alpha=0.05$, we would say this significantly different!

So we can tell the company that it appears from the data provided, there is a statistically significant difference between U.S. and U.K. users.

# $\chi^2$ Test for Independence

# $\chi^2$ Test of Homogeneity