Analysis of data from Construal Survey (testing whether plural words are perceived as more abstract than their singular counterparts).

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
from __future__ import division
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.stats.api as sms
from pyspan.config import *
from pyspan.plurals.analysis import *
assert mturk

In [3]:
# Comment to include participants who failed the attention check
cl = cl.loc[cl.atc_passed]

In [4]:
cl.Condition.value_counts()

CONCRETE    149
ABSTRACT    131
Name: Condition, dtype: int64

## Logistic regression

Selection of plural ~ Condition + Dummy indicating whether or not this was the first survey the participant took (including participant-level effects)

We will commit to throwing out all data from the construal level survey by participants who did not take the survey first if the order dummy has a non-zero coefficient.

In [5]:
cdummied, Y = dummy(cl, sets = np.stack((words["large"].values,
                                         words["small"].values)),
                    classes = [ "ABSTRACT", "CONCRETE" ])
X, Y = df_to_matrix(cdummied, Y, 
                    columns = { 0: "condition", 
                                1: "order" })

In [6]:
logit = SparseLR(Y, X); print logit.coef[:2]; logit.auc

[0.55565445 0.        ]


0.6123788019308914

## t-tests

We predict that participants in the ABSTRACT condition are more likely to select the pluralized form of the item than participants in the CONCRETE condition.

In [7]:
csummary = cl[["Condition"]]
ixs = (words.index[:30] + 100).values
dat = cl[ixs].values
props = np.apply_along_axis(get_prop, 1, dat, 
                            words["large"], 
                            words["small"])
csummary["ppl"] = props
assert csummary.values.shape == (len(cl), 2)

In [8]:
a = csummary.loc[csummary["Condition"] == "ABSTRACT"]["ppl"].values
b = csummary.loc[csummary["Condition"] == "CONCRETE"]["ppl"].values

In [9]:
np.mean(a), stats.sem(a)

(0.6653812406773713, 0.030338676631516532)

In [10]:
np.mean(b), stats.sem(b)

(0.40941942206695736, 0.03027613208203545)

In [11]:
dsw_a = sms.DescrStatsW(a)
dsw_b = sms.DescrStatsW(b)
cm = sms.CompareMeans(dsw_a, dsw_b)
cm.ttest_ind(usevar="unequal", alternative="larger")

(5.971881959396195, 3.5894357607054466e-09, 276.75939314447265)

In [12]:
delta = dsw_a.mean - dsw_b.mean
se_delta = cm.std_meandiff_separatevar
print(delta, delta - 2*se_delta, delta + 2*se_delta)

(0.2559618186104137, 0.17023948841342051, 0.34168414880740694)
