#### <h2>Statistics: The Science of Decisions Project Instructions</h2>

Background Information

In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED, BLUE. In the incongruent words condition, the words displayed are color words whose names do not match the colors in which they are printed: for example PURPLE, ORANGE. In each case, we measure the time it takes to name the ink colors in equally-sized lists. Each participant will go through and record a time from each condition.
Questions For Investigation

<b>As a general note, be sure to keep a record of any resources that you use or refer to in the creation of your project. You will need to report your sources as part of the project submission.</b>
**1.** What is our **independent variable**? What is our **dependent variable**?

**2.** What is an **appropriate set of hypotheses** for this task? What **kind of statistical test** do you expect to perform? Justify your choices.

Now it’s your chance to try out the Stroop task for yourself. Go to this link, which has a Java-based applet for performing the Stroop task. Record the times that you received on the task (you do not need to submit your times to the site.) Now, download this dataset which contains results from a number of participants in the task. Each row of the dataset contains the performance for one participant, with the first number their results on the congruent task and the second number their performance on the incongruent task.

**3.** **Report some descriptive statistics** regarding this dataset. Include at least one **measure of central tendency** and at least one **measure of variability**.


**4.** Provide one or two **visualizations that show the distribution of the sample data**. Write one or two sentences noting what you observe about the plot or plots.

**5.** Now, **perform the statistical test and report your results**. What is your confidence level and your critical statistic value? Do you reject the null hypothesis or fail to reject it? Come to a conclusion in terms of the experiment task. Did the results match up with your expectations?

**6.** Optional: What do you think is responsible for the effects observed? Can you think of an alternative or similar task that would result in a similar effect? Some research about the problem will be helpful for thinking about these two questions!

**_________**



**1a) Independent variable**

In a stroop task experiment, the independent variable is whether or not the ink color of the listed words match the written color name. When the color of the ink and the text of the word match we call it the congruent word condition, and the incongruent word condition when the ink color and word text are different. We can call this binary variable Ink Match (where the only possible values are Congruent and Incongruent).

**1b) Dependent variable**

The dependent variable is time it takes to name the ink colors, in equally sized lists. Each participant will go through the experiment and record a time for both conditions, starting with the congruent condition.

**2a) Hypotheses**

The null hypothesis for this project is that Ink Match has no effect on the mean time it takes the participants to solve the tasks.

The alternative hypothesis is that a positive Ink Match (the congruent condition) lowers the mean time it takes the participants to solve the tasks.

Mathematically, we can display the hypotheses like this:

$\overline X_1 =$ 
mean time of the congruent condition in the sample

$\overline X_2 =$ mean time of the incongruent condition in the sample

__$H_0:  \overline X_1 = \overline X_2$__

__$H_A:  \overline X_1 < \overline X_2$__


An alpha level of 0.05 will be used when testing for statistical significance.

**2b) Statistical test**

A dependent two sample T-test will be used for this task, as the task has a repeated measures design. This means that the same subjects takes both tests, and their results are recorded for comparison. Although the alternative hypothesis is that mean time will increase in the Incongruent condition, I will still perform a two-tailed T-test as it is technically possible for the mean time to decrease instead.

Another option would be to use repeated measures ANOVA. As the independent variable only has two levels in the stroop task, I have decided to use the paired T-test instead. 


**3a) Descriptive statistics**

In [5]:
import pandas as pd
df = pd.read_csv('data_files/stroopdata.csv')
des_stats = df.describe()
des_stats

Unnamed: 0,Congruent,Incongruent
count,24.0,24.0
mean,14.051125,22.015917
std,3.559358,4.797057
min,8.63,15.687
25%,11.89525,18.71675
50%,14.3565,21.0175
75%,16.20075,24.0515
max,22.328,35.255


**4) Visualization: plot**

In [6]:
%pylab inline
import matplotlib.pyplot as plt

df[['Congruent','Incongruent']].cumsum().plot(figsize=(10, 8))

ImportError: No module named 'matplotlib'

This  plot shows the total absolute differences between the Congruent and Incongruent test results. As the number of test results increases, the difference appears to increase as well.

In [None]:
df['diff'] = df['Incongruent']-df['Congruent']
#Adding a new column with percentage difference between Incongruent and Congruent results
df['%_faster'] = (df['diff']/df['Incongruent'])*100

df['%_faster'].plot.hist(alpha=0.5,range=[0,100],figsize=(8,7),normed=True)

This histogram visualizes how much faster the Congruent test results are, sorted in 10% bins. We see that 37% of the test subjects scored between 40 and 50% higher on the Congruent test than the Incongruent test. The distribution of differences looks to be fairly normal, with a slight bimodal trend, potentially due to the relatively small sample size. 

**5a) Statistical tests**

As mentioned earlier, I will perform a paired t-test to test for statistical significance of the samples. I will set the alpha level at 0.05. If significant, I will also measure the effect size using Cohen's d.

In [8]:
#t-critical value, from t-table (https://s3.amazonaws.com/udacity-hosted-downloads/t-table.jpg)
deg_free=len(df.index)-1
al_lv = 0.05
t_crit_val = 2.069

print("The degrees of freedom is {0} and the alpha level is {1}. This gives a t-critical value of {2}".format(deg_free,al_lv,t_crit_val))

The degrees of freedom is 23 and the alpha level is 0.05. This gives a t-critical value of 2.069


In [10]:
from scipy import stats
import numpy as np
#paired t-test
#inf_stats = pd.DataFrame(columns=('test','test2'))
#inf_stats.index=('T-test')
#n = len(df.index)


t_statistic,p_value = stats.ttest_rel(df['Congruent'],df['Incongruent'])
#converting p_value to fixed number for clarity
p_value = '{0:.10f}'.format(p_value)
#Also convert 2-sided to 1-sided, more info here: 
#http://stats.stackexchange.com/questions/31361/some-questions-about-two-sample-comparisons 
#(see links in comment section)
print("t statistic: {0}".format(t_statistic))
print("p-value: {0}".format(p_value))

ImportError: No module named 'scipy'

In [111]:
#getting descriptive statistics for the difference
des_stats = df.describe()
#calculating Cohen's d
cohen_d = des_stats.loc['mean','diff']/des_stats.loc['std','diff']

print(cohen_d)

1.63721994912


According to the Cohen's D value, the difference between the two means (the effect) is 1.67 standard deviations. This is considered a large effect size.
http://mandeblog.blogspot.no/2011/05/cohens-d-and-effect-size.html

In [112]:
# #Will most likely not include
# slope, intercept, r_value, p_value, std_err = stats.linregress(df['Congruent'],df['Incongruent'])

# print(r_value,p_value)
# r_squared = r_value ** 2
# print(r_squared)

# r_squared2= t_statistic ** 2 / (t_statistic ** 2 + 2)

# print(r_squared2)

In [113]:
df['diff']
df.axes
des_stats

Unnamed: 0,Congruent,Incongruent,diff,%_faster
count,24.0,24.0,24.0,24.0
mean,14.051125,22.015917,7.964792,34.950316
std,3.559358,4.797057,4.864827,16.118201
min,8.63,15.687,1.95,8.954494
25%,11.89525,18.71675,3.6455,20.050451
50%,14.3565,21.0175,7.6665,38.775726
75%,16.20075,24.0515,10.2585,46.351071
max,22.328,35.255,21.919,63.926155


In [156]:
#Updating the descriptive statistics data frame to include diff
des_stats = df.describe()

#margin_of_error = t_crit_val * (des_stats.loc['std','diff'] / len(df.index))
margin_of_error = t_crit_val * stats.sem(df['diff'])

se1 = stats.sem(df['diff'])
se2 = des_stats.loc['std','diff'] / sqrt(len(df.index))

print(se2)

ci_low = des_stats.loc['mean','diff'] - margin_of_error
ci_high = des_stats.loc['mean','diff'] + margin_of_error

print("Confidence interval for sample difference: ({0}, {1})".format(ci_low,ci_high))

"""
Alternative way of calculating Confidence Interval (gives same result, but need to understand better before using)
#import math
#CI = stats.t.interval(0.95,len(df.index)-1,loc=des_stats.loc['mean','diff'],scale=des_stats.loc['std','diff']/math.sqrt(len(df.index)))
#print(CI)
"""

0.993028634778
Confidence interval for sample difference: (5.91021542131028, 10.019367912023053)
(5.9105554239684226, 10.019027909364912)



paired t-test
cohen's d
#r^2

CI

**5b) Statistical test observations**

6) Optional: What do you think is responsible for the effects observed? Can you think of an alternative or similar task that would result in a similar effect? **Some research about the problem will be helpful for thinking about these two questions!**

In my own experience, the text of the word seems to be a larger factor when naming the color than whether the ink and word text match. In other words, I read words faster than I find words for colors. This seems to match the findings of John Ridley Stroop's orginal study (_Studies of interference in serial verbal reactions_, 1935, 
http://psychclassics.yorku.ca/Stroop/). He explained this by the automation of reading, ... 

In [None]:
Emotional stroop test 

**6.** Optional: What do you think is responsible for the effects observed? Can you think of an alternative or similar task that would result in a similar effect? Some research about the problem will be helpful for thinking about these two questions!

** Sources: **

Stroop, John Ridley, 1935. _Studies of interference in serial verbal reactions_. http://psychclassics.yorku.ca/Stroop/

Python documentation, book:  
McKinney, Wes, 2012. _Python for Data Analysis_. O'Reilly Media

Python documentation, websites:  
http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.stats.t.html  
http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.stats.ttest_rel.html  
http://pandas.pydata.org/pandas-docs/stable/visualization.html  
https://drive.google.com/folderview?id=0ByIrJAE4KMTtaGhRcXkxNHhmY2M (Pandas DataFrame Notes.pdf)  

Wikipedia articles:  
https://en.wikipedia.org/wiki/Stroop_effect  
https://en.wikipedia.org/wiki/Emotional_Stroop_test
https://en.wikipedia.org/wiki/Effect_size
https://en.wikipedia.org/wiki/Repeated_measures_design
https://en.wikipedia.org/wiki/Analysis_of_variance
https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
https://en.wikipedia.org/wiki/Covariance
https://en.wikipedia.org/wiki/Skewness
https://en.wikipedia.org/wiki/Operational_definition

Statistical methods and terminology are largely based on teachings from the Descriptive Statistics and Inferential Statistics Udacity courses.