In [28]:
from IPython.core.display import Image, display
import numpy as np
from scipy.stats import chisquare

The 'chi square test' is used to comapare the relative frequencies in a contingency table and it is based in the Chi Square distribution $X^2$.  A contingency table is a type of table in a matrix format that displays the frequency distribution of the variables (https://en.wikipedia.org/wiki/Contingency_table).  The null hypothesis tests that the frequency equals a given frequency.  Omission in the equation assumes that all frequencies are equal, otherwise you are specifying a specific distribution.

In [29]:
display(Image(url='ChiSquare.png', width=150, height=150))

Where:<br/>
O : observed <br/>
E: expected <br/>
i: ith observation

Let's supose that we have the following contingency table and we want to check if the frequencies are equal, the observed differences are random.

| 16| 18| 16| 14| 12| 12|
|---|---|---|---|---|---|


In [30]:
chisquare([16, 18, 16, 14, 12, 12])

Power_divergenceResult(statistic=2.0, pvalue=0.84914503608460956)

In [31]:
chisquare([50, 35, 16, 14, 12, 5])

Power_divergenceResult(statistic=65.545454545454533, pvalue=8.6365403606903405e-13)

In [32]:
chisquare([16, 18, 16, 14, 12, 12], f_exp=[16, 16, 16, 16, 16, 8])

Power_divergenceResult(statistic=3.5, pvalue=0.62338762774958223)

In [33]:
chisquare([10, 10, 10, 10, 10, 5], f_exp=[50, 50, 50, 50, 50, 25])

Power_divergenceResult(statistic=176.0, pvalue=3.8243258342713894e-36)

Another example:

| Handedness 	                                         	|
|------------	|--------------	|-------------	|-------	|
| Gender     	| Right handed 	| Left handed 	| Total 	|
| Male       	| 43           	| 9           	| 52    	|
| Female     	| 44           	| 4           	| 48    	|
| Total      	| 87           	| 13          	| 100   	|


What is the expected frequency if the proportion of right handed people is the same as left handed?

In [37]:
from scipy.stats import chi2_contingency
obs = np.array([[43,9], [44,4]])
chi2_contingency(obs)

(1.0724852071005921, 0.30038477039056599, 1, array([[ 45.24,   6.76],
        [ 41.76,   6.24]]))

In [38]:
obs = np.array([[43,9], [44,4]]).T
obs.shape

(2, 2)

By setting axis=None, the test is applied to all data in the array, which is equivalent to applying the test to the flattened array (by default the test is applied to each column).

In [39]:
chisquare(obs)
chisquare(obs, axis=None)

Power_divergenceResult(statistic=55.280000000000001, pvalue=5.9838129011748208e-12)

The difference is not random...

Refernce:<br/>
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html<br/>
https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.chi2_contingency.html