# Stats Task
#### 3/3/23

For stats, I want to use a linear mixed-effects model (like I would with the lmer package in R) with the following formula: z_score ~ that-trace * island + (1 | SubjID) + (1 | Item). I would like to include as many random effects of subject and item as possible while making sure that the model converges.

-Maho

In [None]:
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

In [None]:
df_ju = pd.read_csv("judgment_data_clean.csv")
df_bg = pd.read_csv("background_data_clean.csv")

In [None]:
# Only include critical items
df_ju = df_ju[df_ju["trial_type"] == "experiment-critical"]

# Using the condition column, create a that-trace and an island column
df_ju["that_trace"] = df_ju["condition"].apply(lambda s: "crossed" in s)
df_ju["island"] = df_ju["condition"].apply(lambda s: "_island" in s)
df_ju.head()

Unnamed: 0,subj_id,list,item,trial_type,condition,sentence,judgment,zscore,that_trace,island
3,1,1,11,experiment-critical,nested_island,I talked about the man that I like the truck t...,3,-0.476923,False,True
6,1,1,8,experiment-critical,nested_no-island,I looked up the hospital that I believe I sent...,5,0.570437,False,False
10,1,1,6,experiment-critical,nested_no-island,I wrote about the farmer that I believe I rece...,6,1.094118,False,False
14,1,1,16,experiment-critical,crossed_no-island,I heard the politician that I believe I helped...,3,-0.476923,True,False
17,1,1,13,experiment-critical,crossed_no-island,I looked for the purse that I believe I receiv...,2,-1.000603,True,False


In [None]:
df_bg.head()

Unnamed: 0,subj_id,input_US_age,input_age,input_birth,input_comf,input_gender,input_lang,input_lang_comf,input_parent,submission_time
0,1,,20,In the U.S.,English,Female,,,English,2023-02-14 17:08:33
1,2,,22,In the U.S.,English,Female,,,English,2023-02-14 17:43:31
2,3,,22,In the U.S.,English,Female,,,English,2023-02-14 18:52:07
3,5,,20,In the U.S.,English,Female,Vietnamese,,Others,2023-02-14 21:13:03
4,7,,20,In the U.S.,English,Female,English and Spanish,,Others,2023-02-14 21:27:39


In [None]:
# Remove rows where the zscore is NaN
df_ju = df_ju[df_ju["zscore"].notnull()]

In [None]:
formula = "zscore ~ that_trace * island + (1 | subj_id) + (1 | item)"
model = smf.mixedlm(formula, groups="subj_id", data=df_ju)
result = model.fit()
print(result.summary())

                    Mixed Linear Model Regression Results
Model:                    MixedLM        Dependent Variable:        zscore   
No. Observations:         480            Method:                    REML     
No. Groups:               30             Scale:                     0.3999   
Min. group size:          16             Log-Likelihood:            -475.9360
Max. group size:          16             Converged:                 Yes      
Mean group size:          16.0                                               
-----------------------------------------------------------------------------
                                  Coef.  Std.Err.    z    P>|z| [0.025 0.975]
-----------------------------------------------------------------------------
Intercept                          0.489    0.096   5.096 0.000  0.301  0.677
that_trace[T.True]                -0.161    0.082  -1.973 0.049 -0.321 -0.001
island[T.True]                    -0.936    0.082 -11.408 0.000 -1.097 -0.775
that_t

