# <font color = green>NON-PARAMETRIC TESTS </font>
***

The work with small samples can lead to the non-acceptance of the validity of the central limit theorem and also to the impossibility of making assumptions about the distribution of the evaluated variable. When this occurs, it is necessary to apply non-parametric tests. In non-parametric tests, we do not make assumptions about the (probability) distribution from which the observations are extracted.

## <font color = 'red'> Problem </font>


Before each match of the national football championship, the coins used by the referees must be checked to make sure that they are not addicted, that is, that they do not tend to a certain result. For this, a simple test must be carried out before each match. This test consists of flipping the game currency ** 50 times ** and counting the ** FACE ** and ** CROWN ** frequencies obtained. The table below shows the result obtained in the experiment:

|| FACE | CROWN |
| - | - | - |
| Observed | 17 | 33 |
| Expected | 25 | 25 |

At a ** significance level of 5% **, is it possible to say that the coin is not honest, that is, that the coin is more likely to fall with the ** FACE ** face upwards?

## <font color = green> 4.1 Chi-Square Test ($\chi^2 $) </font>
***

Also known as the fit adequacy test, its name is due to the fact that it uses a standardized statistical variable, represented by the Greek letter qui ($ \ chi $) squared. The table with the standardized values and how to obtain it can be seen below.

The $ \ chi ^ 2 $ test tests the null hypothesis that there is no difference between the observed frequencies of a given event and the frequencies that are actually expected for that event.

The test application steps are very similar to those seen for parametric tests.

![Acceptance Region](https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img017.png)

### Building table $\chi^2$
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi.html

In [2]:
import pandas as pd
from scipy.stats import chi

table_t_chi_2 = pd.DataFrame(
    [], 
    index=[i for i in range(1, 31)],
    columns = [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.975, 0.95, 0.99, 0.995]
)

for index in table_t_chi_2.index:
    for column in table_t_chi_2.columns:
        table_t_chi_2.loc[index, column] = "{0:0.4f}".format(chi.ppf(float(column), index)**2)

table_t_chi_2.index.name='Graus de Liberdade'
table_t_chi_2.rename_axis(['p'], axis=1, inplace = True)

table_t_chi_2

p,0.005,0.010,0.025,0.050,0.100,0.250,0.500,0.750,0.900,0.975,0.950,0.990,0.995
Graus de Liberdade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,0.0,0.0002,0.001,0.0039,0.0158,0.1015,0.4549,1.3233,2.7055,5.0239,3.8415,6.6349,7.8794
2,0.01,0.0201,0.0506,0.1026,0.2107,0.5754,1.3863,2.7726,4.6052,7.3778,5.9915,9.2103,10.5966
3,0.0717,0.1148,0.2158,0.3518,0.5844,1.2125,2.366,4.1083,6.2514,9.3484,7.8147,11.3449,12.8382
4,0.207,0.2971,0.4844,0.7107,1.0636,1.9226,3.3567,5.3853,7.7794,11.1433,9.4877,13.2767,14.8603
5,0.4117,0.5543,0.8312,1.1455,1.6103,2.6746,4.3515,6.6257,9.2364,12.8325,11.0705,15.0863,16.7496
6,0.6757,0.8721,1.2373,1.6354,2.2041,3.4546,5.3481,7.8408,10.6446,14.4494,12.5916,16.8119,18.5476
7,0.9893,1.239,1.6899,2.1673,2.8331,4.2549,6.3458,9.0371,12.017,16.0128,14.0671,18.4753,20.2777
8,1.3444,1.6465,2.1797,2.7326,3.4895,5.0706,7.3441,10.2189,13.3616,17.5345,15.5073,20.0902,21.955
9,1.7349,2.0879,2.7004,3.3251,4.1682,5.8988,8.3428,11.3888,14.6837,19.0228,16.919,21.666,23.5894
10,2.1559,2.5582,3.247,3.9403,4.8652,6.7372,9.3418,12.5489,15.9872,20.4832,18.307,23.2093,25.1882


<img src='https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img016.png' width = '250px'>

Table with the values of $\chi_p ^ 2 $ as a function of degrees of freedom $ (n - 1) $ and $p = P (\chi ^ 2 \leq \chi_p ^ 2) $

## <font color = 'red'> Problem </font>

Before each match of the national football championship, the coins used by the referees should be checked to make sure that they are not addicted, that is, that they do not tend to a certain result. For this, a simple test must be performed before each match. This test consists of flipping the game currency ** 50 times ** and counting the ** FACE ** and ** CROWN ** frequencies obtained. The table below shows the result obtained in the experiment:

|| FACE | CROWN |
| - | - | - |
| Observed | 17 | 33 |
| Expected | 25 | 25 |

At a ** significance level of 5% **, is it possible to say that the coin is not honest, that is, that the coin is more likely to fall with the ** FACE ** face upwards?

### Problem data

In [3]:
observed_f = [17, 33]
expected_f = [25,25]
significance = 0.05
confidence = 1 - significance
k = 2 # Number of possible events
degrees_of_freedom = k - 1

### ** Step 1 ** - formulation of hypotheses $ H_0 $ and $ H_1 $

#### <font color = 'red'> Remember, the null hypothesis always contains the equality claim </font>

### $H_0: F_{FACE} = F_{CROWN}$

### $H_1: F_{FACE} \neq F_{CROWN}$

---

### ** Step 2 ** - fixing the test significance ($\alpha$)

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi.html

In [4]:
from scipy.stats import chi

In [5]:
table_t_chi_2[:3]

p,0.005,0.010,0.025,0.050,0.100,0.250,0.500,0.750,0.900,0.975,0.950,0.990,0.995
Graus de Liberdade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,0.0,0.0002,0.001,0.0039,0.0158,0.1015,0.4549,1.3233,2.7055,5.0239,3.8415,6.6349,7.8794
2,0.01,0.0201,0.0506,0.1026,0.2107,0.5754,1.3863,2.7726,4.6052,7.3778,5.9915,9.2103,10.5966
3,0.0717,0.1148,0.2158,0.3518,0.5844,1.2125,2.366,4.1083,6.2514,9.3484,7.8147,11.3449,12.8382


### Get $\chi_{\alpha}^2$

In [6]:
chi_2_alpha = chi.ppf(confidence, degrees_of_freedom) ** 2
chi_2_alpha

3.8414588206941245

![Região de Aceitação](https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img018.png)

---

### ** Step 3 ** - calculation of the test statistic and verification of this value with the test acceptance and rejection areas

# $$\chi^2 = \sum_{i=1}^{k}{\frac{(F_{i}^{Obs} - F_{i}^{Exp})^2}{F_{i}^{Exp}}}$$


Where

$F_{i} ^ {Obs} $ = frequency observed for the $ i $ event

$F_{i} ^ {Exp} $ = expected frequency for the $ i $ event

$k$ = total possible events





In [8]:
chi_2  = (((observed_f[0] - expected_f[0]) ** 2 ) / (expected_f[0])) + (((observed_f[1] - expected_f[1]) ** 2 ) / (expected_f[1]))
chi_2

5.12

In [11]:
chi_2 = 0
for i in range(k):
    chi_2 += ((observed_f[i] - expected_f[i]) ** 2 ) / (expected_f[i])

chi_2

5.12

![Estatística-Teste](https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img019.png)

---

### ** Step 4 ** - Acceptance or rejection of the null hypothesis

<img src='https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img020.png' width=80%>

### <font color = 'red'> Critical value criterion </font>

> ### Reject $ H_0 $ if $\chi_{test} ^ 2> \chi_{\alpha} ^ 2 $

In [12]:
chi_2 > chi_2_alpha

True

### <font color = 'green'> Conclusion: With a 95% confidence level we reject the null hypothesis ($ H_0 $) and conclude that the observed and expected frequencies are discrepant, that is, the currency is not honest and needs to be replaced. </font>

### <font color='red'> $ p $ value criterion </font>

> ### Reject $ H_0 $ if $p\leq\alpha$

In [13]:
chi_2

5.12

In [16]:
import numpy as np 

sqrt_chi_2 = np.sqrt(chi_2)
sqrt_chi_2

2.262741699796952

In [17]:
p_value = chi.sf(sqrt_chi_2, df=1)
p_value

0.023651616655355978

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html

In [19]:
from scipy.stats import chisquare

In [21]:
chi_2, p_value = chisquare(observed_f, expected_f)
print(chi_2, p_value)

5.12 0.023651616655356


In [22]:
p_value <= significance

True

---