- title: Basic Hypothesis Tests
- tags: week6, statistics
- date: 2018-12-30

In this lesson, we're going to very quickly rip through the basic hypothesis tests, their uses, and how to achieve them in Python.  I won't spend a lot of time on this, because the mathematical details are covered in the assigned reading, and, at any rate, I think for practical purposes regression analysis is more important for lawyers.  Also, this is basically AP/undergrad stats material, so you've probably seen it somewhere already. 

Coverage: 

Chi-square

Fisher's exact test

F-test/Anova

T-test

Basic nonparametric t-test alternative (Wilcoxon./Mann Whitney)

## Chi-squared

- Use: test relationship between categorical variables, test whether two samples drawn from the same distribution. 

- Legal example: The population of registered voters/drivers license holders/etc. is X% women and Y% men; the jury pool is A% women and B% men. Should we believe that the potential jurors were selected from the whole population? P-value means how likely would we be to get the observed distribution from the underlying population.

- Basic idea: Take a crosstabulation (a.k.a. contingency table) of the sample, and calculate what the expected values of the cells in the table would be from the population.  See if they're crazy different.

Sticking with our jury example, suppose we had the following data on our underlying population and our jury pool: 

|                  | Women | Men  |
|------------------|-------|------|
| Not in Jury Pool | 4850  | 4650 |
| In Jury Pool     | 150   | 350  |


Here's how we'd do this in Python

In [1]:
from scipy.stats import chi2_contingency
import numpy as np

In [2]:
crosstab = np.array([[4850, 4650], [150, 350]])
chi2, p, dof, expected = chi2_contingency(crosstab)

In [3]:
p

6.802875042267758e-20

Further reading on chi-squared: 

- [Khan Academy](https://www.khanacademy.org/math/statistics-probability/inference-categorical-data-chi-square-tests)

- [Scipy documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html)

- [A lengthier tutorial in Python](https://machinelearningmastery.com/chi-squared-test-for-machine-learning/)

- [And another](https://towardsdatascience.com/running-chi-square-tests-in-python-with-die-roll-data-b9903817c51b) 

- [A nice article describing some of the assumptions of chi-squared in more detail](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900058/)

In our example, I might feel comfortable rejecting the null hypothesis that the jury pool came from the underlying population with a p-value that low. Or, same idea different phrasing, I might reject the null hypothesis that gender and being called to jury service are independent. 

But *maybe not*, because chi-squared is known to be sensitive to sample size---big samples can produce really big test statistics, and hence really small p-values, sort of arbitrarily.  So maybe we should think of a different test. 




## Fisher's exact test

Fisher's exact test is an alternative to the chi-squared that used to be a lot less popular before computational power went through the roof.  As the name suggests, it produces better estimates than chi-squared, but at the cost of computational complexity, so even today it's rarely used for big contingency tables (you could use chi-squared for a 4x4 table, a 9x9 table, etc.)  Traditionally, Fisher's exact test is used for small samples that are inappropriate for chi-squared. 

But it's never wrong to conduct an extra test!  (Well, it's wrong if you're doing research and you don't report them all.  We'll talk about this later.  Report every damn test you conduct.)  So let's try it on our big contingency table here.

In [4]:
from scipy.stats import fisher_exact

In [6]:
oddsratio, p = fisher_exact(crosstab)
print(p)

2.2248886823458282e-20


Also significant!  

Further reading on Fisher's exact test: 

- [Scipy docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.fisher_exact.html)

- [A biostats person explains chi-squared, binomial, and Fisher tests](http://davidquigley.com/talks/2015/biostatistics/module_07.1.html#fishers-exact-test)

