# The Chi-square Goodness of Fit Test 

The Chi-square Goodness of fit tests whether the observed distribution of a variable fits a given distribution. When the calculated p-value is less than the significance level, reject the null hypothesis. 

We import `chisquare` from `scipy.stats`.

In [22]:
from scipy.stats import chisquare
significance = 0.05
observed=[51,24,13,12]
expected=[48,25,15,12]
chi, pval = chisquare(observed, expected)

print('chi-square=%.6f, p-value=%.2f' % (chi, pval))

if pval < significance:
	print('At %.2f level of significance, we reject the null hypotheses and accept H1. They are not independent.' % (significance))
else:
	print('At %.2f level of significance, we accept the null hypotheses. They are independent.' % (significance))

chi-square=0.494167, p-value=0.92
At 0.05 level of significance, we accept the null hypotheses. They are independent.


In [None]:
# Null 


# The Pooled two-sample t-test 

In IB Diploma Mathematics Applications and Interpretation, you are required to find the pooled two-sample t-test. This test is a comparison of the means of two independent set of data that are sampled selected from a normally-distributed population.

We are going to use on-tailed tests.

## One-tailed tests

\begin{align}
\text{If }\ H_0:\mu_1 \geq \mu_2, then H_1:\mu_1<\mu_2\neq0 \tag{1}\\
\text{If }\ H_0:\mu_1 \leq \mu_2, then H_1:\mu_1>\mu_2\neq0 \tag{2}\\
\end{align}


We are going to use [statsmodels](https://www.statsmodels.org/dev/index.html) module to find out the pooled two-sample t-test. 

We need to install it from your terminal. 

If you are using Anaconda,

`conda install -c conda-forge statsmodels`

You can install it by using `pip`.

`pip install statsmodels`

Import necessary libraries.

We set the alternative hypothesis in the option `alternative`. 

- ‘two-sided’ (default): H1: difference in means not equal to value $H_1:\mu_1\ne\mu_2$
- ‘larger’ : H1: difference in means larger than value $H_1:\mu_1>\mu_2$
- ‘smaller’ : H1: difference in means smaller than value $H_1:\mu_1<\mu_2$

In the following we set it to `smaller` which means 
$$H_0:\mu_1\geq\mu_2$$
$$H_1:\mu_1<\mu_2$$

Since IB requires the pooled test, we set it as `usevar='pooled'`.

In [39]:
import statsmodels.stats.weightstats as sm

significance = 0.05
list1=[3,5,4,6,6,5,3,2,3,4,5,3,4]
list2=[4,6,6,7,6,4,4,4,3,6,5,4,5]
tstat, pvalue, df = sm.ttest_ind(
    list1,list2,
    alternative='smaller',
    usevar='pooled')

print("""Test statistic=%.2f, 
p-value=%.4f, 
degree of freedom=%.0f\n""" % (tstat,pvalue,df))

if pvalue < significance:
	print("""At %.2f level of significance, 
we reject the null hypotheses. 
The mean 1 is less than the mean 2.""" % (significance))
else:
	print("""At %.2f level of significance, 
we accept the null hypotheses.  
The mean 1 is greater than the mean 2.""" % (significance))

Test statistic=-1.77, 
p-value=0.0451, 
degree of freedom=24

At 0.05 level of significance, 
we reject the null hypotheses. 
The mean 1 is less than the mean 2.


# Reference

- https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

- https://www.statsmodels.org/dev/generated/statsmodels.stats.weightstats.ttest_ind.html