### Analyzing the Stroop Effect
Perform the analysis in the space below. Remember to follow [the instructions](https://docs.google.com/document/d/1-OkpZLjG_kX9J6LIQ5IltsqMzVWjh36QpnP2RYpVdPU/pub?embedded=True) and review the [project rubric](https://review.udacity.com/#!/rubrics/71/view) before submitting. Once you've completed the analysis and write up, download this file as a PDF or HTML file and submit in the next section.


(1) What is the independent variable? What is the dependent variable?

The independent variable is whether the word and its color are the same (congruent) or different (incongruent).
The dependent variable is the length of time needed to read the word lists aloud.

Ho: μ = μ0

Ha: μ < μ0 

My null hypothesis is that, within the entire population, the mean number of seconds it takes to read a list of congruent words is about the same as the mean number of seconds it takes to read a list of the same number of incongruent words.

My alternative hypothesis is that, again within the entire population, the mean number of seconds it takes to read a list of congruent words is less than the mean number of seconds it takes to read a list of the same number of incongruent words.

I expect to perform a 1-tailed 2-sample, dependent t-test. The samples are the same size.

I chose a t-test because I'm only comparing two sets of data, the sample size is relatively small, and I am checking to see if the two samples are significantly different. I also expect the test statistic to follow a normal distribution. 

Because my alternative hypothesis is only looking for a negative change (shorter period of time to read the words), I will perform a 1-tailed test. I am specifically looking for an increase in time based on my experience taking the test, and previous research done on the Stroop Effect. There are two sets of numbers being tested, so it will be a 2-sample test. Because the experiment was performed on one person to test how long it takes them to read the two different lists, it will be a dependent test. (In other words, the same person is performing under two different conditions.)

(3) Report some descriptive statistics regarding this dataset. Include at least one measure of central tendency and at least one measure of variability. The name of the data file is 'stroopdata.csv'.

In [9]:
import pandas as pd

stroopdata = pd.read_csv('stroopdata.csv')
stroopdata.describe()

Unnamed: 0,Congruent,Incongruent
count,24.0,24.0
mean,14.051125,22.015917
std,3.559358,4.797057
min,8.63,15.687
25%,11.89525,18.71675
50%,14.3565,21.0175
75%,16.20075,24.0515
max,22.328,35.255


Please note chart above.

Measures of central tendency: the mean is 14.05 for the congruent column, and 22.02 for the incongruent column.
Measures of variability: the standard deviation is 3.56 for the congruent column and 4.8 for the incongruent column.

In [13]:
import seaborn as sns
import matplotlib.pyplot as plt

con = stroopdata['Congruent']
sns.distplot(con, label="Congruent", axlabel=False);

incon = stroopdata['Incongruent']
sns.distplot(incon, label="Incongruent", axlabel=False);

plt.legend();

The highest points of each plot are noticeably different, and the minimum value for the incongruent set barely even overlaps with the highest point of the congruent set. I'm also interested in the little outlier for the incongruent set -- the congruent one doesn't have an additional bump like that.

(5) Now, perform the statistical test and report the results. What is the confidence level and your critical statistic value? Do you reject the null hypothesis or fail to reject it? Come to a conclusion in terms of the experiment task. Did the results match up with your expectations?

In [11]:
import scipy.stats

scipy.stats.ttest_rel(stroopdata['Congruent'], stroopdata['Incongruent'])

Ttest_relResult(statistic=-8.020706944109957, pvalue=4.1030005857111781e-08)

As indicated by the test above, the t-statistic is -8.0207 (rounded), and the p-value is 4.103e-8. The t-critical value is -2.069 (using an alpha of 5% and 23 degrees of freedom). Because our t value falls within the critical region and the p-value is less than .05, this result is statistically significant.