## Seminar 3: $\chi^2$ Test of Pearson

The objective of this seminar is to study the $\chi^2$ test of Pearson to check whether some events follow a discrete random variable.



### Example: Fair or unfair dice

We want to check whether a dice is fair with $\alpha=0.05$. In the language of this course, we have that, if $p_i$ is the probability of getting the value $i$ when we roll a dice, 

$H_0: p_1=p_2=p_3=p_4=p_5=p_6=\frac{1}{6}$

$H_1:$ the dice is unfair

We consider that we have observed the following values: 18, 22, 15, 25, 19, 11. And we want to use the command chi.test of R to solve our hypothesis test.

In [25]:
v_o <- c(18, 22, 15, 25, 19, 11)


# We perform the chi-squared test 
chisq.test(v_o)


	Chi-squared test for given probabilities

data:  v_o
X-squared = 6.7273, df = 5, p-value = 0.2417


#### Explanation:

The statistic of the test, which is $\sum_{i=1}^6 \frac{(o_i-e_i)^2}{e_i}$, is equal to 6.7373 and the p-value equal to 0.2417. From these values, we conclude that, with a significance of $5\%$, the dice can be fair (i.e., we cannot say that the dice is unfair).

We are now changing our hypothesis test to check whether the values obtained from the dice follow another probability distribution:

$H_0: p_1=\frac{1}{2}, p_2=p_3=p_4=p_5=p_6=\frac{1}{10}$

$H_1:$ the values of the dice do not follow that distribution


We are using the same observation from above: 18, 22, 15, 25, 19, 11. And we will see how the chi.test command must be modified slightly to deal with this case:

In [21]:
v_p<-c(1/2,1/10,1/10,1/10,1/10,1/10)

chisq.test(v_o,p=v_p)


	Chi-squared test for given probabilities

data:  v_o
X-squared = 60.982, df = 5, p-value = 7.617e-12


#### Explanation

We observe that the statistic is equal to 60.982 (which is much larger than in the previous case; remember that the value of the statistic in this test gives us an idea of the difference between the observed values and the expected ones) and the p-value is smaller that 0.00001. As a consequence, we reject $H_0$ with a significance level of $5\%$, which means that we cannot say that the dice follows the distribution given in $H_0$, which are $p_1=\frac{1}{2}$, and $p_2=p_3=p_4=p_5=p_6=\frac{1}{10}$
