Preamble and Experimental Design
======

In this variation of the Stroop Task, the independent variable is the congruent or incongruent condition of the words; that is, whether or not the printed color of the word matches its text. The dependent variable is the time it takes to name the ink colors. The obvious question to ask is: does the incongruent condition affect the time needed to name the ink color? The setup of the experiment provides us with dependent, paired samples; one for each condition. As such, a standard Student's t-test for the comparison of dependent samples should be sufficient to determine the condition's effect. Our hypotheses and experimental setup is as follows, where $ \mu_c $ is the mean congruent time for naming, and $ \mu_i $ the mean incongruent time:

$ H_0: \mu_c=\mu_i $

$ H_A: \mu_c \ne \mu_i $

$ \alpha=0.05, n=24 $

In [None]:
import pandas as pd
import numpy as np

# https://chrisalbon.com/python/pandas_dataframe_importing_csv.html
stroop_data = pd.read_csv('stroopdata.csv')
stroop_data

In [None]:
# http://bconnelly.net/2013/10/summarizing-data-in-python-with-pandas/
# https://chrisalbon.com/python/pandas_dataframe_descriptive_stats.html
stroop_data.describe()

In [None]:
# http://stackoverflow.com/questions/10511024/in-ipython-notebook-pandas-is-not-displying-the-graph-i-try-to-plot
%matplotlib inline

# https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/
# http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html#interactive
from ipywidgets import interact

# http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-hist
import matplotlib
matplotlib.style.use('ggplot')

# http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.hist.html
# http://stackoverflow.com/questions/28654003/how-to-plot-histograms-from-dataframes-in-pandas
# http://stackoverflow.com/questions/12125880/changing-default-x-range-in-histogram-matplotlib
# http://stackoverflow.com/questions/24571005/return-max-value-from-panda-dataframe-as-a-whole-not-based-on-column-or-rows
def side_by_side(bin_size):
    stroop_data.hist(bins=bin_size, layout=(1,2), figsize=(12,4), range=(0, stroop_data.values.max()))

interact(side_by_side, bin_size=(1,len(stroop_data),1))

In [None]:
# http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-hist

def overlay(bin_size):
    stroop_data.plot.hist(bins=bin_size, alpha=0.5, figsize=(8,4), range=(0, stroop_data.values.max()))

interact(overlay, bin_size=(1,len(stroop_data),1))

In [None]:
# http://stackoverflow.com/questions/13404468/t-test-in-pandas-python
# http://stackoverflow.com/questions/15984221/how-to-perform-two-sample-one-tailed-t-test-with-numpy-scipy
# https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.stats.ttest_rel.html
from scipy.stats import ttest_rel

ttest_rel(stroop_data['Congruent'], stroop_data['Incongruent'])

In [None]:
# Double-check

stroop_data['dlt'] = stroop_data.apply( lambda x: x.Congruent - x.Incongruent, axis=1 )
stroop_data['diff_dev'] = stroop_data['dlt'].apply( lambda x: (x - stroop_data['dlt'].mean())**2 )
print stroop_data

mean_diff = stroop_data['Congruent'].mean() - stroop_data['Incongruent'].mean()
print "DF mean equals point estimate: ", round(stroop_data['dlt'].mean(), 7) == round(mean_diff, 7)

calc_std = (stroop_data.diff_dev.sum() / (len(stroop_data) - 1) ) ** 0.5
print "DF diff std equals calculated std: ", stroop_data['dlt'].std() == calc_std

t = stroop_data['dlt'].mean() / (stroop_data['dlt'].std()  / len(stroop_data) ** 0.5 )
print "t-statistic: ", t