# Analysis of transitions at probe level

In [2]:
# Load packages -----
library(tidyverse)
library(lme4)
library(lmerTest)


In [3]:

# Load data -------
probdf <- read.csv('..\\..\\data\\probe_as_embedding_transitions.csv',
                 sep='\t', fileEncoding='utf8')
cprobdf = probdf

In [4]:

# Fixed effects models ------

fe01 <- lm(length ~ ADHD, data=cprobdf)

# Mixed effects models ------

me01 <- lmer(length ~ ADHD + (1|suj), data=cprobdf)
me01c <- lmer(length ~ ADHD + genre + age + (1|suj), data=cprobdf)

me02 <- lmer(ADHD ~ length + (1|suj), data=cprobdf) # does not converge

me03 <- lmer(length ~ ADHD_inatt + ADHD_impuls + (1|suj), data=cprobdf)
me03c <- lmer(length ~ ADHD_inatt + ADHD_impuls + genre + age + (1|suj), data=cprobdf)

# MEWS here actually does predict something !
me04 <- lmer(length ~ ADHD + MEWS + (1|suj), data=cprobdf)
me04c <- lmer(length ~ ADHD + MEWS + age + genre + (1|suj), data=cprobdf)
me05 <- lmer(length ~ ADHD_inatt + ADHD_impuls + MEWS + (1|suj), data=cprobdf)
me05c <- lmer(length ~ ADHD_inatt + ADHD_impuls + MEWS + age + genre + (1|suj), data=cprobdf)

# ----------
# We want to:
# 1. "Keep it maximal", i.e. fit the most complex model consistent with
# experimental design that does not result in a singular fit
# 2. Eventually compare models through some criterion like AIC or BIC


"Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?"

In [5]:
step(me03c)

Backward reduced random-effect table:

          Eliminated npar logLik     AIC   LRT Df Pr(>Chisq)    
<none>                  7 893.87 -1773.8                        
(1 | suj)          0    6 680.17 -1348.3 427.4  1  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Backward reduced fixed-effect table:
Degrees of freedom method: Satterthwaite 

            Eliminated   Sum Sq  Mean Sq NumDF  DenDF F value   Pr(>F)    
age                  1 0.002859 0.002859     1 82.475  0.3444 0.558904    
genre                2 0.003307 0.003307     1 81.051  0.3983 0.529751    
ADHD_inatt           3 0.030721 0.030721     1 81.934  3.7001 0.057881 .  
ADHD_impuls          0 0.104806 0.104806     1 83.067 12.6237 0.000631 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model found:
length ~ ADHD_impuls + (1 | suj)

In [6]:
summary(me03)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: length ~ ADHD_inatt + ADHD_impuls + (1 | suj)
   Data: cprobdf

REML criterion at convergence: -1802.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.3557 -0.5826 -0.0889  0.5257  4.2793 

Random effects:
 Groups   Name        Variance Std.Dev.
 suj      (Intercept) 0.007092 0.08422 
 Residual             0.008303 0.09112 
Number of obs: 1042, groups:  suj, 85

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)  0.514145   0.044715 81.583004  11.498  < 2e-16 ***
ADHD_inatt  -0.003971   0.002064 81.933951  -1.924  0.05788 .  
ADHD_impuls -0.005683   0.001935 82.188024  -2.936  0.00431 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) ADHD_n
ADHD_inatt  -0.709       
ADHD_impuls -0.449 -0.277