# Unit 7.3 Basic code solution for two-sample T-test application 

### (Example: detect temperature changes in climate data)

Same as the 7.2 Notebook, here we reduced the notebook code to the essential components.
Starting point is loading in two data samples from two files.


#### H0: There is no difference in the means between the two data samples!


### 1. Importing all packages and our own module

And check what we imported and how the functions work.

In [None]:
# Python code convention is that standard packages are imported first
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

In [None]:
monthlist=[1,2,3,4,5,6, 
           7,8,9,10,11,12]
monthstr=['Jan','Feb','Mar','Apr','May','Jun',
        'Jul','Aug','Sep','Oct','Nov','Dec']

m=1 # choose your month!
mon=monthstr[m-1]
file1=f"sample1_{mon}.csv"
df1=pd.read_csv(file1)
file2=f"sampel2_{mon}.csv"
df2=pd.read_csv(file2)
print("loaded sample data from files "+file1+" "+file2)

In [None]:
df1

In [None]:
df2

### 2. T-test calculations with the Scipy package function

#### 2-sided t-test at significance level %5  (p<0.05)
`stats.ttest_ind(sample1,sample2,equal_var=False)`

### NOTE: We assume that the sample data were cleared from np.nan values

If there are nan-values, consider using filter methods (e.g. Pandas `dropna()` method)

In [None]:
# out data from the data frames
sample1=df1['avgt']
sample2=df2['avgt'] 

# significance level for the two-sided t-test
alpha=0.05
#########################################################
# the t-test function call
#########################################################
t,pvalue=stats.ttest_ind(sample1,sample2,equal_var=False)

print(f"Perform t-test for differences in the mean for month {mon}")
print(80*"-")
print(f"mean   of sample 1: {sample1.mean():.2f} F and mean  of sample 2: {sample2.mean():.2f} F")
print(f"stddev of sample 1: {sample1.std():.2f} F and stddev of sample 2: {sample2.std():.2f} F")
print(f"(size of  sample 1: {sample1.size} and size of sample 2: {sample2.size})")
d=sample2.mean()-sample1.mean()
if pvalue<0.05:
    relation='<'
    decision='(significant)' 
else:
    relation='>'
    decision='(not significant)'
print(f"difference = {d:.3f}, t-value={t:.3f}, p={pvalue:.6f} {relation} {alpha:.6f} {decision}")

---
### References

- [Function ttest_ind from scipy.stats](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html)
- [Welch's form of the t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test) (ttest_ind supports calculating this test statistic when we set the keyword parameter _equal_var=False_)