## PS3 Discussion: PO Framework in Experiments

In this notebook, we will analyze data on psychological phenomenon of "anchoring." As defined, anchoring is a cognitive bias whereby an individual's decisions are influenced by a particular reference point or anchor.

In this case, students guessed the maximum speed of a cat in miles per hour. In one condition, the "high" condition, students were prompted with the message 

"Is the maximum speed of an average house cat FASTER or SLOWER than **40** miles per hour?"

In the other condition, the "low" condition, students were prompted with

"Is the maximum speed of an average house cat FASTER or SLOWER than **3** miles per hour?"

The student then had to provide an estimate for the speed of a house cat. We want to learn: did the anchoring have an effect on students' estimates? Let's assume that students were randomly assigned to each group.


-------

**Data Dictionary/Codebook**

`cat`: estimated speed in mph

`before`: 1 = student knew the exercise beforehand, 0 = student did not know

`live.cats`: 1 = student lives with cats, 0 = student does not live with cats

`cond`: 1 = student was exposed to the "high" condition (40 mph), 0 = student was exposed to the "low" condition (3 mph)

In [5]:
# Run this cell to load data
anchor <- read.csv("anchoring_data_FA20.csv")
head(anchor)

Unnamed: 0_level_0,cat,before,live.cats,cond
Unnamed: 0_level_1,<int>,<int>,<int>,<int>
1,2,0,0,0
2,1,0,0,1
3,30,0,0,1
4,25,0,0,1
5,17,0,0,0
6,50,0,0,1


**Quick Check 1**

What are the potential outcomes for this experiment? Talk with your neighbors for a few minutes about this.

(**Note:** We do NOT necessarily have a control group in this study. In a perfect study, we would have ALSO asked a third group of people to answer the question "What is the maximum speed of an average house cat", without an anchor, but we don't have that data in this case.)




**Quick Check 2**

Now, before we begin, let's "clean" the dataset. In this case, we want to remove some potentially problematic variables; we do NOT want to study people who (1) have lived with a cat or (2) know about the goal of the experiment, since they'll be more knowledgeable and have different answers. Use the subset function, and save the result to the new variable `focus_anchor`.


In [6]:
no_cat <- subset(anchor, live.cats == 0)
focus_anchor <- subset(no_cat, before == 0) 


#Equivalently, one-liner: focus_anchor <- subset(anchor, live.cats == 0 & before == 0)

head(focus_anchor)

Unnamed: 0_level_0,cat,before,live.cats,cond
Unnamed: 0_level_1,<int>,<int>,<int>,<int>
1,2,0,0,0
2,1,0,0,1
3,30,0,0,1
4,25,0,0,1
5,17,0,0,0
6,50,0,0,1


Now, let's compare the results!

In [7]:
# What am I doing in the codes below? Explain to your neighbor. 

high_focus <- subset(focus_anchor, cond == 1)
low_focus <- subset(focus_anchor, cond == 0)

mean(high_focus$cat) # What does this number mean? 
# It tells the average potential outcome in the high speed group.
mean(low_focus$cat) # This one?
# It tells the average potential outcome in the low speed group.

mean(low_focus$cat) - mean(high_focus$cat) # How about this? 
# It shows the average treatment effect. 

# Bonus point: what condition is needed to make this step accurate?
# We can do this because the treatment conditions are randomly assigned.

**Quick Check 3**

If the treatment conditions are not randomly assigned, name a potential omitted variable that can bias the results. 

One way to answer this: maybe those assigned to the high group are more likely to live with cats. At the same time, living with cats makes it more likely that they will get the right answer (i.e. lower answer if they are assigned to the high group, and higher answer if they are assigned to the low group). This will have a negative effect on the size of the estimated average treatment effect.
