# Working with t-tests

Below are 5 sets of data. Each of them have questions which can be addressed using a t-test. For each dataset:

1. Identify the research question
2. Choose the appropriate t-test
3. Do the t-test in Python (see chapter and/or slides for code)
4. Report your results using the APA format, just like in the examples in the book (e.g. With a mean grade of 72.3, the psychology students scored slightly higher than the average grade of 67.5 (𝑡(19)=2.25, 𝑝<.05))
5. Make a figure which illustrates the results

NB: for some of these data sets, you may have to re-arrange the data a little before you can do your analysis! If the data are in long format, where there is a column with a grouping variable, like the example below, an easy way to get data out so they can be entered in a t-test is to make two new variables, like this:

| ID number | Group  | Measure |
| :-------: | :----: | :-----: |
|     1     | A      |   32    |
|     2     | B      |  43.2   |
|     3     | A      |  31.2   |
|     4     | B      |  22.1   |

Group1 = pd.DataFrame(df.loc[df['GroupingVariable'] == 'A']['Measure'])

Group2 = pd.DataFrame(df.loc[df['GroupingVariable'] == 'B']['Measure'])

Now the t-test can be done with Group1 and Group2.

## Dataset 1: "Moon and Aggression"

Description:

This data set, "Moon & Aggression", provides the number of disruptive behaviors by dementia patients during two different phases of the lunar cycle (Moore et al, 2012, p. 410). Each row corresponds to one participant.

Variables:

Moon - The average number of disruptive behaviors during full moon days.
Other - The average number of disruptive behaviors during other days.

Assignment:
Examine the adequacy of the null hypothesis which states that the average number of disruptive behaviors among patients with dementia does not differ between moon days and other days. Calculate an appropriate test statistic, and make a figure illustrating the comparison. Write a sentence reporting your results in the same way they are reported in the book (APA format): 

> With a mean grade of 72.3, the psychology students scored slightly higher than the average grade of 67.5 (𝑡(19)=2.25, 𝑝<.05)

References:

Moore, D. S., McCabe, G. P., and Craig. B. A. (2012) Introduction to the Practice of Statistics (7th ed). New York: Freeman.

"These data were collected as part of a larger study of dementia patients conducted by Nancy Edwards and Alan Beck, Purdue University." (Moore et al, 2012, p. N-8). 

In [17]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/ethanweed/datasets-for-teaching/main/JASP_data_library/2.%20T-Tests/Moon%20and%20Aggression.csv')
df.head()

Unnamed: 0,Moon,Other
0,3.33,0.27
1,3.67,0.59
2,2.67,0.32
3,3.33,0.19
4,3.33,1.26


In [31]:
import statistics


mean_fullmoon = statistics.mean(df['Moon'])
mean_other = statistics.mean(df['Other'])
N = df.shape[0]
degrees_of_freedom = N-1

print("Mean full moon =", mean_fullmoon)
print("Mean other periods =", mean_other)
print("N =", N)
print("df =", degrees_of_freedom)


t, p = ttest_rel(a = df['Moon'], b = df['Other'], alternative = 'two-sided')

print("t =", t)
print("p =", p)


Mean full moon = 3.022
Mean other periods = 0.5893333333333334
N = 15
df = 14
t = 6.451788554357532
p = 1.5181521009727053e-05


In [33]:
import seaborn as sns



## Dataset 2: "Horizontal Eye Movements"

Description:

This data set, "Horizontal Eye Movements", provides the number of recalled words by two groups of participants - during the retention inverval, one group was isntructed to fixate on a centrally presented dot; the other group was instraucted to execute horizontal saccades. Specifically, 

"Participants were presented with a list of neutral study words for a subsequent free recall test. Prior to recall, participants were requested to perform - depending on the experimental condition - either horizontal, vertical, or no eye movements (i.e., looking at a central fixation point). The type of eye movement was thus manipulated between subjects. As the effect of eye movement on episodic memory has been reported to be influenced by handedness, we tested only strong right-handed individuals. The dependent variable of interest was the number of correctly recalled words.'' (Matzke et al, 2015, p. 3)

This data set contains only data from participants assigned to the horizontal and no eye movements condition. Calculate an appropriate test statistic, and make a figure illustrating the comparison.

Variables:

ParticipantNumber - Participant's identification number.
Condition - Experimental condition (Fixed = fixed gaze, Horizontal = horizontal eye movements).
CriticalRecall - The number of Recalled words after the memory retrieval task. 

Assignment:
Examine whether the data are more likely to occur if horizontal eye movements do not help memory retrieval (null hypothesis), or if they have a positive effect on the memory retrieval (one-sided alternative hypothesis). Calculate an appropriate test statistic, and make a figure illustrating the comparison. Write a sentence reporting your results in the same way they are reported in the book (APA format): 

> With a mean grade of 72.3, the psychology students scored slightly higher than the average grade of 67.5 (𝑡(19)=2.25, 𝑝<.05)

Reference:

Matzke, D., Nieuwenhuis, S., van Rijn, H., Slagter, H. A., van der Molen, M. W., and Wagenmakers, E.-J. (2015). The effect of horizontal eye movements on free recall: A preregistered adversarial collaboration. Journal of Experimental Psychology: General: 144:e1-e15.

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/ethanweed/datasets-for-teaching/main/JASP_data_library/2.%20T-Tests/Eye%20Movements.csv')
df.head()

In [None]:
Horizontal = pd.DataFrame(df.loc[df['Condition'] == 'Horizontal']['CriticalRecall'])
Fixation = pd.DataFrame(df.loc[df['Condition'] == 'Fixation']['CriticalRecall'])

from scipy import stats
t, p = stats.ttest_ind(Horizontal, Fixation, equal_var = False)
t, p

## Dataset 3: "Laser Blue Jeans"

Description: 

Experiment comparing tensile strength and extension of blue
jeans that were designed using two methods: manually and with laser beams. Designers of blue jeans often want to treat areas of the jeans to give them a faded look by blasting them with quartz sand under high pressure. The areas to be treated with sand need to be marked out; this is traditionally done by hand with a pen, but this is very time-consuming. A quicker way would be to use lasers to mark the areas, but how does this affect the strength and stretchability of the material? To test this, the authors treated 20 pairs of jeans with each design method, then took 3 samples from different parts of each pair of jeans, giving a total sample size of N = 2(20)(3) = 120. The samples were tested for tensile strength and extension by pulling the fabric samples between two hooks until they ripped.

<img src="/Users/ethan/Documents/GitHub/ExPsyLing/2021/Slides/Images/BlueJeans.png" width=""/>

Assignment:

Determine whether the tensile strength and extension of the blue jeans was significantly different using the time-saving laser technique. Calculate an appropriate test statistic, and make a figure illustrating the comparison. Write a sentence reporting your results in the same way they are reported in the book (APA format): 

> With a mean grade of 72.3, the psychology students scored slightly higher than the average grade of 67.5 (𝑡(19)=2.25, 𝑝<.05)


Variables:

methid: 1 = manual, 2 = laser
jeanid
sampleid
strength (Newtons)
extension (Newtons)

Reference: 

Z. Ondogan, O. Pamuk, E.N. Ondogan, A. Ozguney (2005).
"Improving the Appearance of All Textile Products from Clothing to Home
Textile Using Laser Technology," Optics and Laser Technology, Vol. 37,
pp. 631-637.




In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/ethanweed/datasets-for-teaching/main/university_of_florida/bluejeans_laser.csv')
df.head()

In [None]:
Manual = pd.DataFrame(df.loc[df['method'] == 1]['strength'])
Laser = pd.DataFrame(df.loc[df['method'] == 2]['strength'])

from scipy import stats
t, p = stats.ttest_ind(Manual, Laser, equal_var = False)
t, p


## Dataset 4: "Southern Crime"

Description:

This dataset contains a variety of data on crime rates in different states in the United States. 

| CrimeRate          | Crime rate (number of offences per million population)             | Continuous |
|--------------------|--------------------------------------------------------------------|------------|
| Youth              | Young males (number of males aged 18-24 per 1000)                  | Discrete   |
| Southern           | Southern state 1 = yes, 0 = no                                     | Binary     |
| Education          | Education time (average number of years schooling up to 25)        | Discrete   |
| ExpenditureYear0   | Expenditure (per capita expenditure on police)  skewed             | Continuous |
| LabourForce        | Youth labour force (males employed 18-24 per 1000)                 | Discrete   |
| Males              | Males (per 1000 females)                                           | Discrete   |
| MoreMales          | More males identified per 1000 females 1 = yes, 0 = no             | Binary     |
| StateSize          | State size (in hundred thousands)                                  | Discrete   |
| YouthUnemployment  | Youth Unemployment (number of males aged 18-24 per 1000) skewed    | Discrete   |
| MatureUnemployment | Mature Unemployment (number of males aged 35-39 per 1000)          | Discrete   |
| HighYouthUnemploy  | High Youth Unemployment 1 = yes, 0 = no (high if Youth >3*Mature ) | Binary     |
| Wage               | Wage (median weekly wage)                                          | Continuous |
| BelowWage          | Below Wage (number of families below half wage per 1000)           | Discrete   |


Assignment:
Examine whether there is a significant difference in crime rates between southern and northern states. Calculate an appropriate test statistic, and make a figure illustrating the comparison. Write a sentence reporting your results in the same way they are reported in the book (APA format): 

> With a mean grade of 72.3, the psychology students scored slightly higher than the average grade of 67.5 (𝑡(19)=2.25, 𝑝<.05)

Source:
I'm still tracking down details on this one. But let's assume it is correct.

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/ethanweed/datasets-for-teaching/main/sheffield_MASH/crime.csv')

## Dataset 5: "Weight Gain"

Description:

This data set, "Weight Gain", provides weights of 16 participants before and after an eight-week period of excessive calorie intake (Moore et al., 2012, p. 425).

Variables:

Weight Before - Weight in pounds (lb) measured before eight weeks of excessive calorie intake.
Weight After - Weight in pounds (lb) measured after eight weeks of excessive calorie intake.
Difference - Weight After - Weight Before.

Assignment:

Test the hypothesis that 1000 excess calorie intake per day over 8 weeks results in 16 pounds (approximately 7.2 kilograms) weight increase. Calculate an appropriate test statistic, and make a figure illustrating the comparison. Write a sentence reporting your results in the same way they are reported in the book (APA format): 

> With a mean grade of 72.3, the psychology students scored slightly higher than the average grade of 67.5 (𝑡(19)=2.25, 𝑝<.05) 

References:

Moore, D. S., McCabe, G. P., and Craig, B. A. (2012). Introduction to the Practice of Statistics (7th ed.). New York: Freeman.

Levine, J. A., Eberhardt, N. L., and Jensen, M. D. (1999) Role of nonexcercise activity thermogenesis in resistance to fat gain in humans. Science, 283:212-214.