## Overview

We've already covered what a p-value is and how we apply it to a null and alternative hypothesis. But let's go over a quick review.

When we perform a hypothesis test, we calculate a p-value. Using the significance level we decided on before performing our test, we then have enough information to either 1) reject or 2) fail to reject the null hypothesis.

* 1) p-value < alpha: reject the null hypothesis
* 2) p-value > alpha: fail to reject the null hypothesis

### Example: Dice Roll

We can use a chi-square test on a collection of dice rolls to determine if the dice are fair or to determine if the random number generator we are using is actually random (well, as far as we can detect).

Using dice roll statistics as our data set, we're going to work through the whole process of stating the null hypothesis, performing a chi-square test, deciding on the significance level, determining the p-value, and then making a decision on the null hypothesis.

## Follow Along

We already know the expected value of each number when we roll a dice. For a six-sided die, each number should occur 1/6 or about 16.67% of the time. But, we can estimate the expected frequency for each value by using a random number generator.

Let's decide on the null hypothesis and the significance level. 

### Null Hypothesis

For this situation, it would make sense to choose the null hypothesis to be simply be "the dice are not fair".

### Generated Dice Rolls

We used the random number generator in Python to simulate the dice rolling results. We "rolled" five dice each a total of 50 times. Here are the results, along with the total for each value between 1-6.

|     | A | B | C | D | E | tot | 
|-----|---|---|---|---|---|-----|
|1    |13 | 7 |10 | 5 |13 | 48  |
|2    |5  |7  |4  |12 |9  | 37  |
|3    |5  |9  |14 |0  |10 | 38  |
|4    |12 |13 |8  |7  |7  | 47  |
|5    |7  |10 |9  |13 |6  | 45  |
|6    |8  |4  |5  |13 |5  | 35  |

Each value should come up 1/6 of the time; the total number of rolls is 250 and 250/6=41.67. We can see that the results are pretty close to that number for most of the values except for one (a little high) and six (a little low). 

Let's put the data in NumPy arrays and run a chi-square test on them.

In [1]:
import numpy as np

# Create the array for each die value
a1 = [13, 7, 10, 5, 13]
a2 = [5, 7, 4, 12, 9]
a3 = [5, 9, 14, 0, 10]
a4 = [12, 13, 8, 7, 7]
a5 = [7, 10, 9, 13, 6]
a6 = [8, 4, 5, 13, 5]

# Combine them into a (6,5) array
dice = np.array([a1, a2, a3, a4, a5, a6])

In [2]:
# Import the stats module
from scipy.stats import chi2_contingency

# Perform the chi-square test
stat, p, dof, expected = chi2_contingency(dice, correction=False)

# Print out the stats in a nice format
print('Expected values: \n ', expected.round(2))
print('The degrees of freedom: ', dof)
print(f'The chi square statistics is: {stat:.3f}')
print(f'The p value is: {p:.6f}')

Expected values: 
  [[9.6 9.6 9.6 9.6 9.6]
 [7.4 7.4 7.4 7.4 7.4]
 [7.6 7.6 7.6 7.6 7.6]
 [9.4 9.4 9.4 9.4 9.4]
 [9.  9.  9.  9.  9. ]
 [7.  7.  7.  7.  7. ]]
The degrees of freedom:  20
The chi square statistics is: 40.375
The p value is: 0.004477


### Interpret the result - computer generated

Now we need to use the [Table: Chi-Square Probabilities](https://people.richland.edu/james/lecture/m170/tbl-chi.html) and a significance level to interpret our result. Let's choose an alpha level of 0.005. Our calculated chi-square of 40.375 is greater than 39.997 so we can conclude our results are random and that the computer is not making use of a "rigged" die.

We can also use the calculated p-value of 0.00447 which is less than 0.005 and come to the same conclusion.

### Physical Dice

Let's look at the rolls from a random assortment of actual, physical dice. We set up the number of rolls and dice in the same way as for the random number generator. Here are the results of five dice being rolled 50 times each.

|     | A | B | C | D | E | tot | 
|-----|---|---|---|---|---|-----|
|1    |4  | 3 |5  |11 |4  | 27  |
|2    |9  |15 |10 |4  |11 | 46  |
|3    |7  |10 |8  |6  |8  | 38  |
|4    |13 |6  |8  |9  |12 | 46  |
|5    |9  |9  |7  |11 |6  | 39  |
|6    |8  |7  |12 |9  |9  | 43  |

In [3]:
# Create the array for each die value
a1 = [4, 3, 5, 11, 4]
a2 = [9, 15, 10, 4, 11]
a3 = [7, 10, 8, 6, 8 ]
a4 = [13, 6, 8, 9, 12]
a5 = [9, 9, 7, 11, 6]
a6 = [8, 7, 12, 9, 9]

# Combine them into a (6,5) array
dice = np.array([a1, a2, a3, a4, a5, a6])

In [4]:
# Perform the chi-square test
stat, p, dof, expected = chi2_contingency(dice, correction=False)

# Print out the stats in a nice format
print('Expected values: \n ', expected.round(2))
print(f'The chi square statistics is: {stat:.3f}')
print(f'The p value is: {p:.6f}')

Expected values: 
  [[5.4 5.4 5.4 5.4 5.4]
 [9.8 9.8 9.8 9.8 9.8]
 [7.8 7.8 7.8 7.8 7.8]
 [9.6 9.6 9.6 9.6 9.6]
 [8.4 8.4 8.4 8.4 8.4]
 [9.  9.  9.  9.  9. ]]
The chi square statistics is: 21.989
The p value is: 0.341086


### Interpret the result - human generated

Again, we'll use the table([Table: Chi-Square Probabilities](https://people.richland.edu/james/lecture/m170/tbl-chi.html)) and a significance level to interpret our result. For this trial, we'll use an alpha level of 0.01. Our calculated chi-square of 21.989 is less than than 37.566. As with the example above, we can also use the calculated p-value. In this case, our p-value of 0.34 is greater than our alpha of 0.01, so we can't reject the null hypothesis.

We can conclude our results are not what we would expect if the dice were fair. In this case, there might be some human error introduced in rolling the dice. And possible the dice themselves not being perfectly fair.

## Challenge

This might be an opportunity to generate your own dice-rolling data and see how your results compare to the computer generated ones. You can use fewer dice (and roll more than one at a time) to collect your sample. Once you have some data, construct a contigency table and calculate your chi-square statistic. Then compare your results using your preferred significance level. Are your dice fair?

## Additional Resources

* [Chi-square Test of Independence](https://stattrek.com/chi-square-test/independence.aspx)
* [Table: Chi-Square Probabilities](https://people.richland.edu/james/lecture/m170/tbl-chi.html)
* [Two-way Tables and the Chi-square Test](http://www.stat.yale.edu/Courses/1997-98/101/chisq.htm)