# Hypothesis Testing with Python

* <a href="">Github Notebook link</a>
* <a href="mailto:joeganserlectures@gmail.com">By Joe Ganser</a>

### Exercise 1: t-test degrees of freedom

Using the formula below (which applies to t-tests with unequal degrees of freedom), create a python function that performs this calculation

$$
\text{df} = \frac{{\left(\frac{{s_1^2}}{{n_1}} + \frac{{s_2^2}}{{n_2}}\right)^2}}{{\left(\frac{{s_1^2}}{{n_1}}\right)^2 / (n_1 - 1) + \left(\frac{{s_2^2}}{{n_2}}\right)^2 / (n_2 - 1)}}
$$

* s1 = standard deviation of sample 1

* n1 = sample 1 size

Calculate the degrees of freedom for:

* s1 = 5.32, n1 = 10
* s2 = 7.03, n2 = 12

In [7]:
def degrees_of_freedom_t_test(s1,n1,s2,n2):
    s1n1 = (s1**2)/n1
    s2n2 = (s2**2)/n2
    top = (s1n1+s2n2)**2
    bottom = ((s1n1)**2)/(n1-1) + ((s2n2)**2)/(n2-1)
    return top/bottom

degrees_of_freedom_t_test(5.32,10,7.03,12)

19.853795244472487

### Exercise 2: Perform a two sample t-test

Create a python function performs a two sample t-test of unequal variances. Use your result from problem 1 to calculate the degrees of freedom. The function arguments should be;

* sample_mean1
* sample_mean2
* sample_std1 (standard deviation)
* sample_Std2
* n1 (sample 1 size)
* n2
* tails (the number of tails for the test)
* alpha (the significance level)

Make the function print the;
* p_value
* test statistic
* critical value
* alpha

Make the function `return` one of two conclusions;
* `Reject null hypothesis` (if `p_value < alpha` and `|test_statistic|>|critical_value|`)
* Otherwise, `Fail to reject null hypothesis`

**RECALL**
We calculate the `test_statistic` by;
$$
t = \frac{{\bar{X}_1 - \bar{X}_2}}{{\sqrt{\frac{{s_1^2}}{{n_1}} + \frac{{s_2^2}}{{n_2}}}}}
$$

**HINTS**
* `p_value` can be calculated using `(1 - t.cdf(abs(t_statistic), df))` *FOR EACH TAIL*
* `critical_value` can be calculate using `t.ppf(1 - alpha / tails, df)`

Run this function for the following values (from lecture notes):

* sample_mean1 = 22.29
* sample_mean2 = 16.3
* sample_std1 = 5.32
* sample_std2 = 7.03
* n1 = 10
* n2 = 12
* tails = 2
* alpha = 0.05

In [9]:
from scipy.stats import t

def calculate_t_test(sample_mean1, sample_mean2, sample_std1, sample_std2, n1, n2, tails,alpha):
    # Calculate the t-statistic
    df = round(degrees_of_freedom_t_test(sample_std1,n1,sample_std2,n2))
    standard_error = ((sample_std1 ** 2) / n1 + (sample_std2 ** 2) / n2) ** 0.5
    test_statistic = (sample_mean1 - sample_mean2) / (standard_error)

    # Calculate the p-value
    p_value = tails*(1 - t.cdf(abs(test_statistic), df))

    # Calculate the critical value
    critical_value = t.ppf(1 - alpha / tails, df)
    
    print('p_value: {}'.format(p_value))
    print('critical_value: {}'.format(critical_value))
    print('test_statistic: {}'.format(test_statistic))
    print('Significance level alpha: {}'.format(alpha))
    
    if p_value<=alpha:
        if critical_value<0 and test_statistic<critical_value:
            return "Reject Null Hypothesis"
        elif critical_value>0 and test_statistic>critical_value:
            return "Reject Null Hypothesis"
    else:
        return "Fail to Reject Null Hypothesis"
    
calculate_t_test(22.29, 16.3, 5.32, 7.03, 10, 12, tails=2,alpha=0.05)

p_value: 0.03425038740869879
critical_value: 2.0859634472658364
test_statistic: 2.2723574890239813
Significance level alpha: 0.05


'Reject Null Hypothesis'