# Influence of Color on Working Memory

## Background and Overview

This project was conducted as undergraduate research regarding the effects that color has on working memory. The original project was developed using Qualtrics, SPSS, and JASP. Python was later used with this dataset as a way of learning pandas, numpy, scipy, etc. The original data was pulled from Qualtrics as an Excel file, then cleaned and analysed using python.

## Importing required packages

In [1]:
import pandas as pd
import os
import numpy as np
from unicodedata import category
from scipy import stats
import matplotlib.pyplot as plt
pwd = os.getcwd()

## Data Cleaning

In [None]:
# Import dataset as infcol
infcol = pd.read_excel(pwd + "\infcol_fixed_column_name.xlsx")

# Drop unnecessary row
infcol.drop(index = 0, inplace = True)

# Drop unecessary columns
infcol1 = infcol.drop(columns= ["StartDate", "EndDate", "IPAddress", "RecordedDate", "ResponseId", "Progress", "RecipientLastName",
"RecipientFirstName", "RecipientEmail", "ExternalReference", "LocationLatitude", "LocationLongitude",
"DistributionChannel",  "UserLanguage", "Q_RecaptchaScore", "Informed Consent", "Status"])

# Drop rows based on condition, then droping columns and reseting index
index_finished = infcol1[infcol1["Finished"] == "False"].index
infcol1 = infcol1.drop(index_finished)
index_colorblind = infcol1[infcol1["Colorblind"] == "Yes"].index
infcol1 = infcol1.drop(index_colorblind)
infcol1.reset_index(drop = True, inplace = True)
infcol1.drop(columns = ["Finished", "Colorblind"], inplace = True)

# Data Cleaning
infcol1.iloc[9,2] = "22"
infcol1.iloc[30,2] = "22"
infcol1.iloc[44,2] = "20"

# Wordlist
Wordlist = ["unit", "golf", "solo", "slam", "fate", "iron", "rear", "grip", "rage", "room", "tone",
          "pour", "snap", "lily","easy","good","band","fame","lump","mile","part","mole","snub",
          "case","club","dance","solve","green", "utter", "terms","spare","creed","blank","choke",
          "noble","place","trial","dough","ridge","obese","elite","sweep","lover","feign", "truth",
          "seize", "smart","aware","grind","clean","carpet","tender","wonder","ballot","manual",
          "empire","critic","reject","reader","sleeve","cheese","chorus","galaxy","listen","infect",
          "makeup","barrel","banish","bronze","stroke","action","doctor","exotic","deputy","gutter"]
          
# Converting words to lowercase and getting rid of extra spaces
infcol1.iloc[:,4:] = infcol1.iloc[:,4:].apply(lambda x : x.astype(str).str.lower())
infcol1.iloc[:,4:] = infcol1.iloc[:,4:].apply(lambda x: x.astype(str).str.strip())

# Counting correct words given by participants
def f(x):
    return x.apply(lambda x: x in Wordlist)
infcol1.iloc[:,4:-2] = infcol1.iloc[:,4:-2].apply(f)
infcol1["Count"] = infcol1.iloc[:,4:-2].sum(axis = 1)

# Dropping Useless columns
infcol1 = infcol1.drop(infcol1.iloc[:,4:-2], axis = 1)

# Data Cleaning
Race_dict = {"White": "W", "Black or African American": "B", "Asian": "A", "Latino": "L", "Other": "O"}
infcol1.rename(columns= {"FL_18_DO" : "Color", "Duration (in seconds)": "Duration(s)"}, inplace = True)
Color_dict = {"block4" : "Red", "block5": "Blue", "block6": "Black"}
Gender_dict = {"Male": "M", "Female": "F"}
infcol1.replace({"Race": Race_dict, "Color": Color_dict, "Gender" : Gender_dict}, inplace = True)

# Changing column datatypes
infcol1["Count"] = infcol1["Count"].astype(int)
infcol1["Duration(s)"] = infcol1["Duration(s)"].astype(int)
infcol1["Age"] = infcol1["Age"].astype(int)
infcol1["Color"] = infcol1["Color"].astype("category")
infcol1["Gender"] = infcol1["Gender"].astype("category")
infcol1["Race"] = infcol1["Race"].astype("category")

infcol1

# Data Analysis

### Descriptive Statistics

In [None]:
print(infcol1.select_dtypes(include = "number").describe())

### Descriptives of Count split by Color

In [None]:
print("RED"'\n',infcol1["Count"][infcol1["Color"] == "Red"].describe())
print("BLACK"'\n',infcol1["Count"][infcol1["Color"] == "Black"].describe())
print("BLUE"'\n',infcol1["Count"][infcol1["Color"] == "Blue"].describe())

### One-Way ANOVA

Count ~ Color

In [None]:
import pingouin as pg

In [None]:
aovIC = pg.anova(data = infcol1, dv = "Count", between = "Color", detailed = True)
print(aovIC)

It was found that color accounts for less that 1% of the variance in the amount of words the participants remembered. 

### Two-Way ANOVA

Count ~ Color, Gender, Color * Gender

In [None]:
twoway_aovIC = pg.anova(data = infcol1, dv = "Count", between = ["Color", "Gender"], detailed = True, ss_type = 3)
print(twoway_aovIC)

Gender does not have a main effect on the amount of words remembered, nor is there an interaction between gender and color.

### Correlations

In [None]:
print("The correlation coefficient of Duration(s) to the amount of words remembered is r =",
infcol1["Duration(s)"].corr(infcol1["Count"], method = "spearman"))

In [None]:
print("The correlation coefficent of Age to the amount of words remembered is r =", 
infcol1["Age"].corr(infcol1["Count"], method = "spearman"),)
print("While, the correlation coefficient of Duration(s) to Age is r =", 
infcol1["Duration(s)"].corr(infcol1["Age"],method = "spearman"))

### Data Visualization

### Boxplot Breakdown of Count by Color

In [None]:
infcol1.boxplot("Count", by = "Color", figsize = (10,7))

### Bargraphs Showing Count split by Color

In [None]:
from matplotlib.pyplot import tight_layout


distRed = infcol1["Count"][infcol1["Color"] == "Red"]
distBlack = infcol1["Count"][infcol1["Color"] == "Black"]
distBlue = infcol1["Count"][infcol1["Color"] == "Blue"]

fig, axs = plt.subplots(1,3,sharey = True, tight_layout = True)

axs[0].hist(distRed, bins = 10)
axs[1].hist(distBlack, bins = 10)
axs[2].hist(distBlue, bins = 10)

axs[0].set_xlabel("Red")
axs[1].set_xlabel("Black")
axs[2].set_xlabel("Blue")

axs[0].set_ylabel("Count")

### Regression Line of Duration(s) ~ Count

In [None]:
count = infcol1["Count"]
duration = infcol1["Duration(s)"]

fig, ax = plt.subplots(figsize = (9,9))

ax.scatter(count, duration, s = 60, alpha = 0.7, edgecolors = "k")

b, a = np.polyfit(count, duration, deg = 1)

xseq = np.linspace(0,60, num = 100)
ax.plot(xseq, a + b * xseq, color = "k", lw = 2.5)

plt.ylim(0,1000)
ax.set_xlabel("Count", fontsize = 20)
ax.set_ylabel("Duration(s)", fontsize = 20)
plt.title("Duration(s) ~ Count Regression Line", fontsize = 25)