Analyses pre-registered for MTurk version of study 3a.

Pre-registration: https://osf.io/de935

In [1]:
from __future__ import division
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pyspan.config import *
from pyspan.plurals.analysis import *
assert mturk

This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was *originally* set to 'module://ipykernel.pylab.backend_inline' by the following code:
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/envs/lop_env/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/envs/lop_env/lib/python2.7/site-packages/traitlets/config/application.py", line 664, in launch_instance
    app.start()
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/envs/lop_env/lib/pyth

## Logistic regression

Selection of plural ~ Valence of item + Condition + Valence of item * Condition + Dummy indicating whether or not this was the first survey the participant took (including participant-level effects)

We hypothesize that the coefficient on Valence of item * Condition will be positive.

In [2]:
vdummied, Y = dummy(valence, classes = [ "POSITIVE", "NEGATIVE" ],
                    sets = np.stack((words["large"], words["small"])),
                    ixs = ixs)

In [3]:
def valence_condition_interaction(v, c):
    c = c if c == 1 else -1
    return v*c
valence_condition_interaction = np.vectorize(valence_condition_interaction)
X, Y = df_to_matrix(vdummied, Y, ixs = ixs,
                    columns = { 0: "valence", 1: "condition", 
                                2: (0,1,valence_condition_interaction), 
                                3: "order" })

In [4]:
logit = SparseLR(Y, X); print logit.coef[:4]; logit.auc



[0.08512446 0.44789681 0.53328298 0.        ]


0.7768210089717392

## t-tests

For each participant, compute the proportion of positive items for which the participant chose the pluralized form of the word, the proportion of neutral items for which the participant chose the pluralized form of the word, and the proportion of negative items for which the participant chose the pluralized form of the word.

In [5]:
vsummary = valence[["Condition"]]
dat = valence[ixs].values
vsummary["pos_lg"] = np.apply_along_axis(get_prop, 1, dat,
                                         pos_lg, pos_sm)
vsummary["neu_lg"] = np.apply_along_axis(get_prop, 1, dat,
                                         neu_lg, neu_sm)
vsummary["neg_lg"] = np.apply_along_axis(get_prop, 1, dat,
                                         neg_lg, neg_sm)
assert vsummary.values.shape == (len(valence), 4)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


### Positive condition

Hypothesis: mean(% pluralized positive items chosen) - mean(% pluralized neutral items chosen) > 0

In [6]:
a = vsummary.loc[vsummary["Condition"] == "POSITIVE"]["pos_lg"].values
b = vsummary.loc[vsummary["Condition"] == "POSITIVE"]["neu_lg"].values
stats.ttest_rel(a, b)

Ttest_relResult(statistic=8.896458848254882, pvalue=9.058927912786137e-15)

Hypothesis: mean(% pluralized negative items chosen) - mean(% pluralized neutral items chosen) < 0

In [7]:
a = vsummary.loc[vsummary["Condition"] == "POSITIVE"]["neg_lg"].values
b = vsummary.loc[vsummary["Condition"] == "POSITIVE"]["neu_lg"].values
stats.ttest_rel(a, b)

Ttest_relResult(statistic=-16.887547973142635, pvalue=4.8495519966939176e-33)

### Negative condition

Hypothesis: mean(% pluralized negative items chosen) - mean(% pluralized neutral items chosen) > 0

In [8]:
a = vsummary.loc[vsummary["Condition"] == "NEGATIVE"]["neg_lg"].values
b = vsummary.loc[vsummary["Condition"] == "NEGATIVE"]["neu_lg"].values
stats.ttest_rel(a, b)

Ttest_relResult(statistic=16.484022497063364, pvalue=1.790598179109982e-36)

Hypothesis: mean(% pluralized positive items chosen) - mean(% pluralized neutral items chosen) < 0

In [9]:
a = vsummary.loc[vsummary["Condition"] == "NEGATIVE"]["pos_lg"].values
b = vsummary.loc[vsummary["Condition"] == "NEGATIVE"]["neu_lg"].values
stats.ttest_rel(a, b)

Ttest_relResult(statistic=-6.89699300433243, pvalue=1.1303832653451923e-10)

### Combining conditions

Recode observations in the negative condition to be the opposite of what participants chose.

In [10]:
vpos = vsummary.loc[vsummary["Condition"] == "POSITIVE"]
vneg = vsummary.loc[vsummary["Condition"] == "NEGATIVE"]
vneg["pos_lg"] = 1 - vneg["pos_lg"]
vneg["neu_lg"] = 1 - vneg["neu_lg"]
vneg["neg_lg"] = 1 - vneg["neg_lg"]
vrecoded = pd.concat([ vpos, vneg ])
assert len(vrecoded) == len(vpos) + len(vneg)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


Hypothesis: mean(% pluralized positive items chosen) - mean(% pluralized neutral items chosen) > 0

In [11]:
a = vrecoded["pos_lg"].values
b = vrecoded["neu_lg"].values
stats.ttest_rel(a, b)

Ttest_relResult(statistic=10.929163693666554, pvalue=2.223102401083143e-23)

Hypothesis: mean(% pluralized negative items chosen) - mean(% pluralized neutral items chosen) < 0

In [12]:
a = vrecoded["neg_lg"].values
b = vrecoded["neu_lg"].values
stats.ttest_rel(a, b)

Ttest_relResult(statistic=-23.346557461441947, pvalue=1.4227184905461655e-67)