# T-test assignment



**Your name (unique name)**:

**Your U-M ID**:

**BEFORE YOU START**:

Copy this template to your own drive and **OUTSIDE** of the course folder. Then, edit on your own copy. **DO NOT** edit this template file directly.

If you are using Google Colab to finish your assignment, make a copy to your own drive and edit.  If you are running on your own local machine and run into issues with packages like `ModuleNotFoundError`, you can install the `colab` extension in your own VSCode, and running with the colab kernel. It will require you to log in with your UMICH google account.


## Introduction

<center> <h3> Do spoken or written words better express intelligence? </h3></center>  
This assignment uses the open data from Experiment 4 of Schroeder and Epley (2015) to teach independent
samples <em>t</em>-tests. Results of the activity provided below should exactly reproduce the results described in the paper.

**CITATION**  
Schroeder, J., & Epley, N. (2015). The sound of intellect: Speech reveals a thoughtful mind, increasing a
job candidate’s appeal. Psychological Science, 26, 877-891.

**LEARNING OBJECTIVES**  
* Conduct independent samples t-tests with Python
* Interpret the t-test results in the context of the research question
* Generate a t-test figure with APA guidelines
* Review the use of Python functions and packages including `pandas`, `seaborn` and `matplotlib`

**STUDY DESCRIPTION**  
Imagine you were a job candidate trying to pitch your skills to a potential employer. Would you be more
likely to get the job after giving a short speech describing your skills, or after writing a short speech and
having a potential employer read those words? That was the question raised by Schroeder and Epley
(2015). The authors predicted that a person’s speech (i.e., vocal tone, cadence, and pitch) communicates
information about their intellect better than their written words (even if they are the same words as in
the speech).

To examine this possibility, the authors randomly assigned 39 professional recruiters for *Fortune 500*
companies to one of two conditions. In the audio condition, participants listened to audio recordings of a
job candidate’s spoken job pitch. In the transcript condition, participants read a transcription of the job
candidate’s pitch. After hearing or reading the pitch, the participants rated the job candidates on three
dimensions: intelligence, competence, and thoughtfulness. These ratings were then averaged to create a
single measure of the job candidate’s intellect, with higher scores indicating the recruiters rated the
candidates as higher in intellect. The participants also rated their overall impression of the job candidate
(a composite of two items measuring positive and negative impressions). Finally, the participants
indicated how likely they would be to recommend hiring the job candidate (0 - not at all likely, 10 -
extremely likely).

**Note about programming**: Part of the assignment is designed to help you review or get familiar with Python functions and common packages. If you get stuck, it is totally fine!. Try to google the error messages, search documentations, and ask the teaching team for help.

In [None]:
# Load libraries
import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Uncomment and Set directories if you are using dataset from your own local machine
# data_dir = '<insert path to directory where dataset is located>'
# os.chdir(data_dir)


# Load dataset into dataframe
file_id = '0Bz-rhZ21ShvOei1MM24xNndnQ00'
resource_key = '0-gBQiGhF6zp2cH8g20zifJg'

# Construct a direct download link
direct_link = f'https://drive.google.com/uc?export=download&id={file_id}&resourcekey={resource_key}'
df = pd.read_csv(direct_link) 


## Analyses

**Question 1**: Open the data file (called Schroeder and Epley 2015 Experiment 4 data). Explore the data file.
Note, you will not analyze all of these variables. *Try to find the variables that are relevant to the
study description above.*

In [None]:
# Start from checking describing the dataset
# Tips: You can use similar code from the correlation assignment to describe the dataset
... # TODO

In [None]:
# Analyse shape of the dataset and the available columns (.shape may be helpful!)
... # TODO


# Split the dataset into two conditions
... # TODO

# Then analyze the shape of the dataset for each condition
... # TODO


**Question 2**: You first want compare participants in the audio condition to participants in the transcript
condition on the `Intellect_Rating` variable. Which type of analysis is appropriate, given the design
described above?




**Question 2a**: We first start from writing a general t-test function for the two conditions. Writing functions allows us to reuse these Here we provide the starter code.

Here we are assuming the two conditions have different variances. For your reference, this is the t-test math formula:
$$
se = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}
$$

$$
t = \frac{\bar{x}_1 - \bar{x}_2}{se}
$$

It could be a bit overwhelming if you are just looking at the formula. But don't worry, follow the comments in the code and you will get it!


In [None]:
# General t-test function
def t_test(column):
    """
    Our own two-sample t-test function!

    Args:
        column (str): The column to test

    Returns:
        None

    """

    # You first need to calculate the mean of the two conditions, and the difference between the means
    # .mean() method is useful here
    audio_mean = ...
    transcript_mean = ...
    mean_diff = ...

    # Then you need to calculate the standard deviation of the two conditions, you can use the .std() method
    audio_sd = ...
    transcript_sd = ...

    # Then you need to calculate the number of participants in the two conditions
    # There are many ways to do this, one way is based on the shape of the dataframe
    audio_n = ...
    transcript_n = ...

    # Then you need to calculate the standard error of the mean, you can use the np.sqrt() function
    # Follow the formula above!
    se = ...
    t_stat = ...
    
    # Now, use the t-stat to determine if the difference is significant, what threshold you would use?
    # Tips: You can go for a p<0.05 and large degree of freedom threshold for the general case.
    # Tips: significant is a boolean (True or False variable)
    significant = ...

    # Finally, print the results
    print(f'{column} Difference in Means: {mean_diff}, t-statistic: {t_stat} > 1.96, is there a difference? {("Yes" if significant==True else "No")}')
    
# If you think the function is working, you can test it with the Intellect_Rating variable
# t-test based on Intellect Rating
t_test('Intellect_Rating')


**Question 3**: Next compare participants in the audio condition to participants in the transcript condition on the
`Impression_Rating` variable.

In [None]:
# t-test based on Impression Rating, you can use what you just built!
... # TODO


**Question 4**: Finally, compare participants in the audio condition to participants in the transcript condition on
the `Hire_Rating` variable.

In [None]:
# t-test based on Hire Rating
... # TODO

**Question 5**: Now it is time to review the results from above! Prepare an APA-style results paragraph briefly describing the results of the analyses performed above. Remember to interpret the results in the context of the research question.


(Your response goes here)

**Question 6**: There are many packages that can perform t-tests in Python. Here we are using the `scipy.stats` package. `Scipy` supports many different types of t-tests. We are assuming both groups are independent samples which allows us to use the `ttest_ind` function. You can find the documentation [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html). One benefit of using the `scipy.stats.ttest_ind()` is that it directly returns, $t$, $p$ and degrees of freedoms without calculating the means and standard deviations on your own. 

Let's try it now!

In [None]:
# To begin with, import the ttest_ind function from the scipy.stats package
from scipy.stats import ttest_ind

# Now let's use the ttest_ind function to perform a t-test on the `Intellect_Rating` variable
# First, like we did above, we need to split the dataset into two conditions
audio_condition = ...
transcript_condition = ...

# Now we can perform the t-test
t, p, df = ttest_ind(audio_condition, transcript_condition, equal_var=False) ## Here we are assuming the two groups have different variances

# Scipy ttest_ind() does not return the mean difference, we need to calculate it manually
mean_diff = audio_condition.mean() - transcript_condition.mean()

# You will see the t-statistic, p-value and degrees of freedom from the output. 
print(f"Mean difference: {mean_diff}, t-statistic: {t}, p-value: {p}, degrees of freedom: {df}, is there a difference? {('Yes' if p < 0.05 else 'No')}")

**Question 7**: Generate a figure to depict the results of the analyses performed above. Make sure to follow
APA guidelines, and include error bars representing +/- 1 standard error of the mean. 

This question is designed to help you get familiar with `matplotlib` and `seaborn`. You will be using these two packages all the time in Python. Try to follow the comments. I am sure you will get it! It is useful to refer to the documentations of the packages, e.g. [`seaborn.barplot` documentation](https://seaborn.pydata.org/generated/seaborn.barplot.html)

In [None]:
# Intellect_Rating Means
plt.figure(figsize=(8,5)) # Set up the size of the figure
# We will create a bar plot using `seaborn`. 
# It takes in the data, the variable for the x-axis, the variable for the y-axis. We set up te errorbar for you.
sns.barplot(data=..., x=..., y=..., errorbar=...) # TODO: fill in the code blocks

# Every plot should have a nice labelling, including title, axis labels.
# Set up the title, x-axis label, y-axis label for your bar plot.
plt.title('...') # TODO: fill in the code blocks
plt.xlabel('...') # TODO: fill in the code blocks
plt.ylabel('...') # TODO: fill in the code blocks

# Finally, show the plot
plt.show()



# Do the same for the other two variables
# Impression_Rating Means
... # TODO

# Hire_Rating Means
... # TODO


## Submission

This is the end of the assignment:) Please remember to save your work and submit your assignment in notebook `.ipynb` format on Canvas.