Analyses pre-registered for in-lab version of study 3c (reported in the appendix accompanying the submitted manuscript as study 3c).

Pre-registration: https://osf.io/de935

In [13]:
from __future__ import division
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pickle
import re
from scipy import stats
import statsmodels.stats.api as sms
from pyspan.utils import *
from pyspan.plurals.analysis import *
assert not mturk
from pyspan.plurals.preprocess import *
from pyspan.plurals.utils import *

In [2]:
cl_raw = pd.read_csv("{}in-lab/Construal_level.csv".format(BASE_DIR))
len(cl_raw), len(cl)

(189, 152)

## Logistic regression

Selection of plural ~ Condition + Dummy indicating whether or not this was the first survey the participant took (including participant-level effects)

We will commit to throwing out all data from the construal level survey by participants who did not take the survey first if the order dummy has a non-zero coefficient.

In [3]:
cdummied, Y = dummy(cl, sets = np.stack((words["large"].values,
                                         words["small"].values)),
                    classes = [ "ABSTRACT", "CONCRETE" ])
X, Y = df_to_matrix(cdummied, Y, 
                    columns = { 0: "condition", 
                                1: "order" })

In [4]:
logit = SparseLR(Y, X); print logit.coef[:2]; logit.auc



[1.03187556 0.22642181]


0.7877222897176389

Because $order$ has a non-zero coefficient, throw out all data by participants who didn't complete the construal level survey first.

In [5]:
cl = cl.loc[cl["order"] == 1]

## Demographic info

In [6]:
len(cl)

52

In [7]:
demographic_info(cl)

Age: 22.8653846154 (SE = 0.977124805481)
Gender: [('Female', 38), ('Male', 14)]


## t-tests

We predict that participants in the ABSTRACT condition are more likely to select the pluralized form of the item than participants in the CONCRETE condition.

In [8]:
csummary = cl[["Condition"]]
dat = cl[ixs].values
props = np.apply_along_axis(get_prop, 1, dat, 
                            words["large"], 
                            words["small"])
csummary["ppl"] = props
assert csummary.values.shape == (len(cl), 2)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [9]:
a = csummary.loc[csummary["Condition"] == "ABSTRACT"]["ppl"].values
b = csummary.loc[csummary["Condition"] == "CONCRETE"]["ppl"].values

In [10]:
np.mean(a), stats.sem(a)

(0.5156313131313132, 0.060983654913058676)

In [11]:
np.mean(b), stats.sem(b)

(0.29541925465838503, 0.044453494829764725)

In [14]:
dsw_a = sms.DescrStatsW(a)
dsw_b = sms.DescrStatsW(b)
cm = sms.CompareMeans(dsw_a, dsw_b)
cm.ttest_ind(usevar="unequal", alternative = "larger")

(2.918029519061532, 0.0027785803899573126, 43.47897343860534)

Compute difference in means and a confidence region of +- 2 standard errors.

In [15]:
delta = dsw_a.mean - dsw_b.mean
se_delta = cm.std_meandiff_separatevar
print(delta, delta - 2*se_delta, delta + 2*se_delta)

(0.22021205847292813, 0.06928002914667886, 0.3711440877991774)
