Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = ""
COLLABORATORS = ""

---

# Statistical Testing

*A hypothesis* is a statement about (probabilistic) distribution of an observed random variable. A hypothesis can be a statement about some parameter $\theta $ of such distribution, e.g., "the mean height of a women is 175 cm".

*A simple hypothesis* $H: \theta = \theta_0$ states that the value of parameter $\theta$ is $\theta_0$ (for some constant $\theta$). On the other side, *a composite hypothesis* $H': \theta \in \Theta_0$ states that the value of $\theta$ is in the set $\Theta_0$.

A hypothesis is tested using observed values $X_1,\dots,X_n$ of a random variable, that is random sample, where the values $X_1,\dots,X_n$ are pairwise independent, but from the same distribution.

Testing a hypothesis consists of finding a suitable function $T = T(X_1,\dots,X_n)$ called test statistics and a set $R$ of values of the function such that we reject the hypothesis if $T(X_1,\dots,X_n) \in R$. Our goal is to reject the hypothesis if it is not valid, but we cannot rule out errors. We would like to minimize probability of errors of the following two types. The probability of type I errors:

$$ p_1 = P[\text{we reject } H \mid H \text{ is valid}] $$

and of Type II errors

 $$ p_2 = P[\text{we do not reject } H \mid H \text{ is not valid}]. $$

Instead of minimizing both errors simultaneously, we minimize type II error while the type I error is limited from above by a constant $\alpha$ (usually 0.05, eventually 0.01 or 0.001). The constant $\alpha$ is called the significance level of the test.


## $\chi^2$-Test

**The "chi-squared" test of independence** determines whether there is a relationship between two categorical variables. Do the values of one category variable depend on the values of the other categorical variable? If both variables are independent, knowing the values of one variable does not give information about the value of the other variable. 

Let us consider the following example. Patients in two hospital departments were observed

[$A$] the department of psychiatry -- sample $A$, and 

[$B$] the department of general ward (slo: interna) -- sample $B$.

The income of the patience was surveyed

* [I] a strongly below-average income,
* [II] a below-average income,
* [II] an average income,
* [IV] an above-average income,
* [V] a strongly above-average income.

The results are in the following table:

Income class | Sample $A$ | Sample $B$ 
:------------:|:----------:|:--------------:
  I          | 17         | 5          
  II         | 25         | 21         
  III        | 39         | 34         
  IV         | 42         | 49         
  V          | 32         | 25         
---------------|--------------|--------------
 Sums        |155         | 134       


Have both samples $A$ and $B$ the same distribution? Test that using $\chi^2$ test of independence with the significance level $\alpha=0.05$. Hence the **null hypothesis $H_0$** is: 

> The variables (*Department* and *Income*) are independent. No relationship exists. In other words:
> * I.e., the difference between the counts in samples $A$ and $B$ are only random.
> * The income does not depend on the type of department.

* What is the alternative hypothesis $H_A$?
* Compute the $\chi^2$-test statistics and decide whether we can reject the null hypothesis.

In [None]:
import numpy as np

In [None]:
table = np.array([[ 17,5],[25,21],[39,34],[42,49],[32,25]],np.float64)
table

In [None]:
# YOUR CODE HERE
raise NotImplementedError()
print("Row sums:", row_sums)
col_sums = ...
# YOUR CODE HERE
raise NotImplementedError()
print("Column sums:", col_sums)

In [None]:
expected_table = ...
# YOUR CODE HERE
raise NotImplementedError()
expected_table

In [None]:
x2stat = ...
# YOUR CODE HERE
raise NotImplementedError()
x2stat

In [None]:
import scipy
from scipy import stats
from matplotlib import pyplot as plt

df = ...
# YOUR CODE HERE
raise NotImplementedError()
print('Degrees of freedom:', df)
x = np.linspace(0, scipy.stats.chi2.ppf(0.999, df), 100)
pdf_chi2 = scipy.stats.chi2.pdf(x, df)
cdf_chi2 = scipy.stats.chi2.cdf(x, df)
plt.plot(x, pdf_chi2, x, cdf_chi2)

What are the functions `pdf`, `cdf`, `sf`, `ppf`, and `isf` in module `scipy.stats.chi2`?

Answer the following questions using appropriate functions from `scipy.stats.chi2`.

For the significance level $\alpha=0.05$, the boundary value for the $\chi^2$-statistics is

In [None]:
alpha = 0.05
# YOUR CODE HERE
raise NotImplementedError()

Hence, we can reject the null hypothesis at the significance level $\alpha=0.05$ only if the value of the statistics will be at least 9.4877.

# $\chi^2$-test and Fisher's Test


Compare tires produced on an old and a new production line using both $\chi^2$-test and Fisher'r exact test. The counts are the number of tires that did/did not survive a test drive for 40000 km.

production line | survived | not survived | Sum
:---:|:---------:|:------------:|:----:
old  | 38 | 5 | 43 
new  | 20 | 9 | 29 
-------------------|-------------|-----------------|-------
Sum | 58 | 14 | 72 

**The null hypothesis is:**

...

YOUR ANSWER HERE

In [None]:
tires = np.array([[38,5], [20,9]])
tires

In [None]:
row_sums = ...
# YOUR CODE HERE
raise NotImplementedError()
print("Row sums:", row_sums)
col_sums = ...
# YOUR CODE HERE
raise NotImplementedError()
print("Column sums:", col_sums)

In [None]:
expected_tires = ...
# YOUR CODE HERE
raise NotImplementedError()
expected_tires

## Using $\chi^2$-test

In [None]:
x2stat = np.sum((tires - expected_tires)**2 / expected_tires)
print(x2stat)

In [None]:
alpha = 0.05
boundary = ...
# YOUR CODE HERE
raise NotImplementedError()
print("Boundary value of the corresponding chi2-statistics is",boundary)
if x2stat >= boundary:
    print("We can reject the null hypothesis that the tires from both lines are of the same quality")
else:
    print("We cannot reject the null hypothesis that the tires from both lines are of the same quality")

In [None]:
print("The probability that the tires from both lines are of the same quality is",
      scipy.stats.chi2.sf(x2stat, df=1))

## Fisher's exact test

In [None]:
tires

We can consider two tests:
1. a one-tailed test, and
2. a two-tailed test.

Tasks:
* What are the null and alternative hypotheses corresponding to the one-tailed Fisher's test?
* What are the null and alternative hypotheses corresponding to the two-tailed Fisher's test?
* What are the more extreme table contents with the same row and column sums
  * for the one-tailed Fisher's test, and
  * for the two-tailed Fisher's test?
* Compute the $p$-value for both one-tailed and two-tailed tests.
* For both tests, answer the following question
  > Is the difference between the quality of tires between the old and new production lines statistically significant?  

In [None]:
from math import factorial

factorial(59)

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=9c17e1ac-b2cc-421e-8492-9f456eb1f406' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>