Analyses pre-registered for in-lab version of study 3b.

Pre-registration: https://osf.io/de935

In [1]:
from __future__ import division
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pickle
import re
from scipy import stats
from pyspan.utils import *
from pyspan.plurals.analysis import *
assert not mturk
from pyspan.plurals.preprocess import *
from pyspan.plurals.utils import *

This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was *originally* set to 'module://ipykernel.pylab.backend_inline' by the following code:
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/envs/lop_env/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/envs/lop_env/lib/python2.7/site-packages/traitlets/config/application.py", line 664, in launch_instance
    app.start()
  File "/Users/sabinasloman/.pyenv/versions/2.7.17/envs/lop_env/lib/pyth

### Between subjects

#### 60% item selection threshold

##### Logistic regression

Selection of positive word ~ Participant's political affiliation + Condition + Participant's political affiliation * Condition + Dummy indicating whether or not this was the first survey the participant took (including participant-level effects)

We hypothesize that the coefficient on Participant's political affiliation * Condition will be positive. We will commit to throwing out all data from the politics survey by participants who did not take the survey first if the order dummy has a non-zero coefficient.

In [2]:
pdummied, Y = dummy(politics, [ "REPUBLICAN", "DEMOCRAT" ],
                    np.stack((positive60, negative60)))
X, Y = df_to_matrix(pdummied, Y, columns = { 0: "id",
                                             1: "condition",
                                             2: (0,1), 3: "order" })

In [3]:
logit = SparseLR(Y, X); logit.coef[:4], logit.auc



(array([ 0.        , -0.05329993,  0.17951362,  0.        ]),
 0.5732944379940252)

##### t-tests

In [4]:
psummary = politics[["Condition", "ident"]]
dat = politics[ixs].values
props = np.apply_along_axis(get_prop, 1, dat, positive60, 
                            negative60)
psummary["ppos"] = props
assert psummary.values.shape == (len(politics), 3)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


For participants in the Democrat condition, the difference in means mean(% positive words chosen amongst Democrats) - mean(% positive words chosen amongst Republicans) > 0.

In [5]:
stats.ttest_ind(psummary.loc[(psummary["Condition"] == "DEMOCRAT") & (psummary["ident"] == "DEMOCRAT")]["ppos"],
                psummary.loc[(psummary["Condition"] == "DEMOCRAT") & (psummary["ident"] == "REPUBLICAN")]["ppos"],
                equal_var = False)

Ttest_indResult(statistic=1.2026057556474177, pvalue=0.2762799568398802)

For participants in the Republican condition, the difference in means mean(% positive words chosen amongst Republicans) - mean(% positive words chosen amongst Democrats) > 0.

In [6]:
stats.ttest_ind(psummary.loc[(psummary["Condition"] == "REPUBLICAN") & (psummary["ident"] == "REPUBLICAN")]["ppos"],
                psummary.loc[(psummary["Condition"] == "REPUBLICAN") & (psummary["ident"] == "DEMOCRAT")]["ppos"],
                equal_var = False)

Ttest_indResult(statistic=2.2064865414198467, pvalue=0.0558938039508809)

The difference in means mean(% positive words chosen amongst those whose condition matched their political identity) - mean(% positive words chosen amongst those whose condition didn't match their political identity) > 0

In [7]:
stats.ttest_ind(psummary.loc[psummary["Condition"] == psummary["ident"]]["ppos"],
                psummary.loc[psummary["Condition"] != psummary["ident"]]["ppos"],
                equal_var = False)

Ttest_indResult(statistic=4.16239640723614, pvalue=6.97282540028223e-05)

#### 80% item selection threshold

##### Logistic regression

In [8]:
pdummied, Y = dummy(politics, 
                    [ "REPUBLICAN", "DEMOCRAT" ],
                    np.stack((positive80, negative80)))
X, Y = df_to_matrix(pdummied, Y, columns = { 0: "id",
                                             1: "condition",
                                             2: (0,1), 3: "order" })

In [9]:
logit = SparseLR(Y, X); logit.coef[:4], logit.auc

(array([0., 0., 0., 0.]), 0.5)

##### t-tests

In [10]:
psummary = politics[["Condition", "ident"]]
dat = politics[ixs].values
props = np.apply_along_axis(get_prop, 1, dat, positive80,
                            negative80)
psummary["ppos"] = props
assert psummary.values.shape == (len(politics), 3)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


For participants in the Democrat condition, the difference in means mean(% positive words chosen amongst Democrats) - mean(% positive words chosen amongst Republicans) > 0.

In [11]:
stats.ttest_ind(psummary.loc[(psummary["Condition"] == "DEMOCRAT") & (psummary["ident"] == "DEMOCRAT")]["ppos"],
                psummary.loc[(psummary["Condition"] == "DEMOCRAT") & (psummary["ident"] == "REPUBLICAN")]["ppos"],
                equal_var = False)

Ttest_indResult(statistic=1.5850334707561222, pvalue=0.16692300537698423)

For participants in the Republican condition, the difference in means mean(% positive words chosen amongst Republicans) - mean(% positive words chosen amongst Democrats) > 0.

In [12]:
stats.ttest_ind(psummary.loc[(psummary["Condition"] == "REPUBLICAN") & (psummary["ident"] == "REPUBLICAN")]["ppos"],
                psummary.loc[(psummary["Condition"] == "REPUBLICAN") & (psummary["ident"] == "DEMOCRAT")]["ppos"],
                equal_var = False)

Ttest_indResult(statistic=0.8125590115861742, pvalue=0.4612148528333456)

The difference in means mean(% positive words chosen amongst those whose condition matched their political identity) - mean(% positive words chosen amongst those whose condition didn't match their political identity) > 0

In [13]:
stats.ttest_ind(psummary.loc[psummary["Condition"] == psummary["ident"]]["ppos"],
                psummary.loc[psummary["Condition"] != psummary["ident"]]["ppos"],
                equal_var = False)

Ttest_indResult(statistic=2.866418936849676, pvalue=0.005135596816726479)

### Within-subjects

#### Logistic regression

Selection of positive word ~ Participant's political identity + Condition + Participant's political identity * Condition + Dummy indicating whether or not the participant took the politics survey before the valence survey (including participant-level effects)

We hypothesize that the coefficient on Participant's political identity * Condition will be positive. 

In [14]:
# Pre-registered: "The within-subject analyses would be run
# both for items within each pre-specified valence category
# (25 positive items, 25 negative items and 25 neutral
# items), and using all 75 non-distractor items."
#
# Change the ixs_ variable below to restrict the subset of
# stims used for analysis
#ixs_ = np.arange(100, 125) # Positive items
#ixs_ = np.arange(125, 150) # Negative items
#ixs_ = np.arange(150, 175) # Neutral items
ixs_ = np.arange(100, 175) # Positive, negative and neutral
# items

In [15]:
pdummied_ws, Y = dummy(politics, 
                       classes = [ "REPUBLICAN", 
                                   "DEMOCRAT" ], 
                       within = True, 
                       ixs = ixs_)
X, Y = df_to_matrix(pdummied_ws, Y, 
                    columns = { 0: "id", 1: "condition", 
                                2: (0,1), 3: "order" }, 
                    ixs = ixs_)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


In [16]:
logit = SparseLR(Y, X); logit.coef[:4], logit.auc 

(array([ 0.        , -0.00734122,  0.19955015,  0.16982531]), 0.58440026452542)

#### t-tests

In [17]:
psummary_ws = summarize(pdummied_ws, ixs_)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  summary["p"] = props


For participants in the Republican condition, the difference in means mean(% positive words chosen amongst Republicans) - mean(% positive words chosen amongst Democrats) > 0.

In [18]:
stats.ttest_ind(psummary_ws.loc[(psummary_ws["Condition"] == 1) & (psummary_ws["ident"] == 1)]["p"],
                psummary_ws.loc[(psummary_ws["Condition"] == 1) & (psummary_ws["ident"] == 0)]["p"],
                equal_var = False)

Ttest_indResult(statistic=2.876381791832602, pvalue=0.012396006374019216)

For participants in the Democratic condition, the difference in means mean(% positive words chosen amongst Democrats) - mean(% positive words chosen amongst Republicans) > 0.

In [19]:
stats.ttest_ind(psummary_ws.loc[(psummary_ws["Condition"] == 0) & (psummary_ws["ident"] == 0)]["p"],
                psummary_ws.loc[(psummary_ws["Condition"] == 0) & (psummary_ws["ident"] == 1)]["p"],
                equal_var = False)

Ttest_indResult(statistic=1.2891890895418727, pvalue=0.27639030967262507)

The difference in means mean(% positive words chosen amongst those whose condition matched their political identity) - mean(% positive words chosen amongst those whose condition didn't match their political identity) > 0

In [20]:
stats.ttest_ind(psummary_ws.loc[psummary_ws["Condition"] == psummary_ws["ident"]]["p"],
                psummary_ws.loc[psummary_ws["Condition"] != psummary_ws["ident"]]["p"],
                equal_var = False)

Ttest_indResult(statistic=3.1121239270561794, pvalue=0.002620759792461148)