Analyses pre-registered for in-lab version of study 3c (reported in the appendix accompanying the submitted manuscript as study 3c).

Pre-registration: https://osf.io/de935

In [1]:
from __future__ import division
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pickle
import re
from scipy import stats
from pyspan.utils import *
from pyspan.plurals.analysis import *
assert not mturk
from pyspan.plurals.preprocess import *
from pyspan.plurals.utils import *

This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was *originally* set to 'module://ipykernel.pylab.backend_inline' by the following code:
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/envs/lop_env/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/envs/lop_env/lib/python2.7/site-packages/traitlets/config/application.py", line 664, in launch_instance
    app.start()
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/envs/lop_env/lib/pyth

In [3]:
cl_raw = pd.read_csv("{}in-lab/Construal_level.csv".format(BASE_DIR))
len(cl_raw), len(cl)

(189, 152)

## Logistic regression

Selection of plural ~ Condition + Dummy indicating whether or not this was the first survey the participant took (including participant-level effects)

We will commit to throwing out all data from the construal level survey by participants who did not take the survey first if the order dummy has a non-zero coefficient.

In [4]:
cdummied, Y = dummy(cl, sets = np.stack((words["large"].values,
                                         words["small"].values)),
                    classes = [ "ABSTRACT", "CONCRETE" ])
X, Y = df_to_matrix(cdummied, Y, 
                    columns = { 0: "condition", 
                                1: "order" })

In [5]:
logit = SparseLR(Y, X); print logit.coef[:2]; logit.auc



[1.03186985 0.22637393]


0.7877222897176389

Because $order$ has a non-zero coefficient, throw out all data by participants who didn't complete the construal level survey first.

In [6]:
cl = cl.loc[cl["order"] == 1]

## Demographic info

In [7]:
len(cl)

52

In [8]:
demographic_info(cl)

Age: 22.8653846154 (SE = 0.977124805481)
Gender: [('Female', 38), ('Male', 14)]


## t-tests

We predict that participants in the ABSTRACT condition are more likely to select the pluralized form of the item than participants in the CONCRETE condition.

In [9]:
csummary = cl[["Condition"]]
dat = cl[ixs].values
props = np.apply_along_axis(get_prop, 1, dat, 
                            words["large"], 
                            words["small"])
csummary["ppl"] = props
assert csummary.values.shape == (len(cl), 2)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [10]:
a = csummary.loc[csummary["Condition"] == "ABSTRACT"]["ppl"].values
b = csummary.loc[csummary["Condition"] == "CONCRETE"]["ppl"].values

In [11]:
np.mean(a), stats.sem(a)

(0.5156313131313132, 0.060983654913058676)

In [12]:
np.mean(b), stats.sem(b)

(0.29541925465838503, 0.044453494829764725)

In [13]:
stats.ttest_ind(a, b, equal_var = False)

Ttest_indResult(statistic=2.918029519061532, pvalue=0.005557160779914619)

Divide the $p$-value reported in the paper in half because this is a two-sided test.

In [14]:
stats.ttest_ind(a, b, equal_var = False).pvalue / 2

0.0027785803899573096

Calculate degrees of freedom.

In [15]:
var_a = np.var(a, ddof = 1) / len(a)
var_b = np.var(b, ddof = 1) / len(b)
num = (var_a + var_b)**2
denom = (var_a**2 / (len(a) - 1)) + (var_b**2 / (len(b) - 1))
num / denom

43.478973438605315

In [16]:
cohensd(a, b)

0.8268100166882769