<a href="https://colab.research.google.com/github/lcnature/PSY291/blob/main/PSY291_Ch10_indsamp_ttest.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Chapter 10 t-test for independent measures design**

In this notebook, let's see how to perform independent-measures t-test with python.

Let's use alpha = 0.05 as significance level and perform a two-tailed test.

# loading data

We start by loading data into python.

A handy package for handling data organized in table is pandas.

In [None]:
import pandas as pd
sigma = 0.05

url_a = 'https://raw.githubusercontent.com/lcnature/PSY291/main/sample_data/sample_a.csv'
url_b = 'https://raw.githubusercontent.com/lcnature/PSY291/main/sample_data/sample_b.csv'

sample_a = pd.read_csv(url_a, index_col=0)
sample_b = pd.read_csv(url_b, index_col=0)
# this function reads the data saved on a text file into a well organized table.

print('sample a')
print(sample_a)
print('sample b')
print(sample_b)

## perform hypothesis testing with a pre-built function.

To perform independent-measures t-test, we use the function
of scipy.stats.ttest_ind

This function requires inputting two arrays of data corresponding to the two samples.
We can retrieve them by `sample_a['score']` and `sample_b['score']`

There are only two additional arguments we need to pay attention to.

- The first is `equal_var`, which tells the function whether we assumet the two populations have equal variance int he score. By default it is set to True (yes), so we don't need to specify it unless we believe the variance is unequal.

- The second is `alternative`, which specifies whether we are doing two-tailed test ('two-sided'), or one-tailed test.

 - In the case of one-tailed test, if we assume the mean of the first population is smaller than the second in our alternative hypothesis, we give the argument `alternative` a value of 'less'. If we assume the first population has a larger mean, we give it 'greater'.

 - For this example, since the default is 'two-sided' and we are indeed doing two-tailed test, we actually don't need to set anything for these two arguments. But we will do it anyway to show how to set these argument.

In [None]:
from scipy.stats import ttest_ind



result = ttest_ind(sample_a['score'], sample_b['score'],
                   equal_var=True, alternative='two-sided')

print(result)


## drawing conclusion

Reading from the result above, we see the t statistic is -1.51. The p-value is 0.14.

Since p is larger than 0.05, we cannot reject the null hypothesis.

Notice the degree of freedom is 56. How did we get that?

It is equal to the sum of the degrees of freedom of the two samples, each being sample sizes minus 1.

In [None]:
df1 = len(sample_a) - 1
df2 = len(sample_b) - 1
df = df1 + df2
print('degree of freedom:', df)

## practice

Try to perform one-tailed test, with assumptions of either the first population has a smaller or a larger mean than the second population. How does p-value changes with this choice. Why do you think that occurs? Can you reject the null hypothesis in either case?

Hint: you can try to visualize the histograms of the two samples or calculate their means to get an intuition.

## Calculating effect size


Finally, as a general tip, to see how a function of a package works, you can use the help function as below, or google the function and package's name, you will likely land on to the document of that function.


In [None]:
help(ttest_ind)