Ονοματεπώνυμο: Ζαμάγιας Μιχαήλ Ανάργυρος

ΑΜ: ΤΠ5000

# Import modules

In [1]:
import pandas as pd
from scipy.stats import chisquare, chi2

# Hypothesis test

Researcher A claims that the probability of a patient belonging to one of the five categories is equal for all categories.
Researcher B, though, claims that the probability of a patient belonging to one of the five categories is not equal for all categories.

$H_0$: Researcher A is right.

$H_1$: Researcher B is right.

In [2]:
null_hypothesis = 'The probability of a patient belonging to one of the five categories is equal for all categories.'
alternative_hypothesis = 'The probability of a patient belonging to one of the five categories is not equal for all categories.'

## One way, using $\chi^2$ critical value (*Pearson's chi-squared test* method)\*
\*:  ["Sustain or reject the null hypothesis that the observed frequency distribution is the same as the theoretical distribution based on whether the test statistic exceeds the critical value of $\chi^2$. If the test statistic exceeds the critical value of $\chi^2$, the null hypothesis ($H_0$ = there is no difference between the distributions) can be rejected, and the alternative hypothesis ($H_1$ = there is a difference between the distributions) can be accepted, both with the selected level of confidence. If the test statistic falls below the threshold $\chi^2$ value, then no clear conclusion can be reached, and the null hypothesis is sustained (we failed to reject the null hypothesis), but not necessarily accepted."](https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test#Definition)

#### Declare initial values

In [3]:
significance_level = 0.05
confidence_level = 1 - significance_level
population_size = 60
disease_data = pd.DataFrame(
    {'Observed Data': [15, 12, 9, 9, 15]},
    index=range(1, 6)
)
disease_data.index.names = ['Category']
print(disease_data)


          Observed Data
Category               
1                    15
2                    12
3                     9
4                     9
5                    15


#### Calculate $c$ degrees of freedom

In [4]:
degrees_of_freedom = len(disease_data) - 1
print(f'{degrees_of_freedom = }')

degrees_of_freedom = 4


#### Calculate $\chi^2$ chi-square

In [5]:
observed_chi2 = chisquare(
    disease_data['Observed Data']
)[0]
print(f'{observed_chi2 = }')

observed_chi2 = 3.0


#### Calculate $\chi_c^2$ chi-square critical value

In [6]:
chi2_critical_value = chi2.ppf(confidence_level, df=degrees_of_freedom)
print(f'{chi2_critical_value = }')

chi2_critical_value = 9.487729036781154


#### Hypothesis test conclusion

In [7]:
if observed_chi2 < chi2_critical_value:
    print(f'{null_hypothesis} Researcher A is right.')
else:
    print(f'{alternative_hypothesis} Researcher B is right.')


The probability of a patient belonging to one of the five categories is equal for all categories. Researcher A is right.


## Another way, using $p$ value ($p$*-value method*)

#### Declare expected data

In [8]:
disease_data['Expected Data'] = [20, 14, 12, 10, 16]
print(disease_data)

          Observed Data  Expected Data
Category                              
1                    15             20
2                    12             14
3                     9             12
4                     9             10
5                    15             16


#### Calculate $p$ value

In [9]:
p_value = chisquare(
    disease_data['Observed Data'],
    disease_data['Expected Data']
)[1]

#### Hypothesis test conclusion

In [13]:
if significance_level <= p_value:
    print(f'{null_hypothesis} Researcher A is right.')
else:
    print(f'{alternative_hypothesis} Researcher B is right.')


The probability of a patient belonging to one of the five categories is equal for all categories. Researcher A is right.


# Hypothesis test conclusion

In both ways we got the same result (as expected), which is rejecting $H_1$ and accepting $H_0$.