# Chapter 14.5: Trial-to-trial learning in Dutch (Analysis in R)

First, load the usual packages for data visualisation and analysis using GAMMs.

In [None]:
library(mgcv)
library(ggplot2)
library(GGally)

## Data preparation

Load the two datasets created in the simulation part. From the dynamic dataset we only retain the columns with the measures, and add `.dynamic` to the names of all measures. Then the two are combined into one dataframe.

In [None]:
dat_static = read.csv("../res/dlp-trial-measures-static.csv")
dat_dynamic = read.csv("../res/dlp-trial-measures-dynamic.csv")[,c(6, 22:23)]
colnames(dat_dynamic)<-paste(colnames(dat_dynamic),"dynamic",sep=".")
dat = cbind(dat_static, dat_dynamic)
head(dat)

Since we also require frequency and orthographic neighbourhood density, we now also load the `dlp-stimuli` datasets (download it from the Dutch Lexicon Project, Keuleers et al, 2010, if you haven't done so before, from [here](https://osf.io/uw7t6/)).

In [None]:
dlp_items = read.csv("../dat/dlp-stimuli.txt", sep="\t")
head(dlp_items)

We merge the two dataframes along the `spelling` and `lexicality` columns.

In [None]:
dat = merge(dat, dlp_items, on=c("spelling", "lexicality"))

Next, we exclude all rows where the reaction times are NaN or infinity, and for simplicity restrict ourselves to "word" responses:

In [None]:
dat=dat[!is.na(dat$rt.raw),]
dat = dat[!is.infinite(dat$rt.raw),]
dat = dat[(dat$response == "W"),]

Some normalisation of variables. For orthographic neighbourhood density we introduce an additional variable called "has_neighbour" which we will use later to avoid introducing bimodality for orthographic neighbourhood density.

In [None]:
dat$RTinv = -1000/dat$rt.raw
dat$nletters = nchar(dat$spelling)
dat[is.na(dat$subtlex.frequency), "subtlex.frequency"] = 0
dat$has_frequency = ifelse(dat$subtlex.frequency > 0, 1, 0)
dat$has_frequency_fac = as.factor(dat$has_frequency)
dat$subtlex.frequency.log = dat$subtlex.frequency
dat$subtlex.frequency.log[dat$has_frequency == 1] = as.numeric(scale(log(dat$subtlex.frequency[dat$has_frequency == 1])))
dat$has_neighbour = ifelse(dat$coltheart.N > 0, 1, 0)
dat$has_neighbour_fac = as.factor(dat$has_neighbour)
dat$coltheart.N.log = dat$coltheart.N
dat$coltheart.N.log[dat$has_neighbour == 1] = as.numeric(scale(log(dat$coltheart.N[dat$has_neighbour == 1])))

In [None]:
table(dat$has_frequency_fac)

In [None]:
dat = dat[order(dat$order),]

## Classical model

For now, it still looks like Coltheart's N has a bimodal distribution, but this will be dealt with using the `has_neighbour` variable below.

In [None]:
ggpairs(dat[, c("subtlex.frequency.log", "coltheart.N.log")])

Run the GAM model:

In [None]:
model.classical = gam(RTinv ~  s(order, k = 200) + 
                                s(subtlex.frequency.log, by=has_frequency_fac) + 
                                has_frequency_fac + 
                                s(coltheart.N.log, by=has_neighbour_fac) + 
                                has_neighbour_fac + s(nletters),
                      data=dat)

In [None]:
summary(model.classical)

In [None]:
options(repr.plot.width=15, repr.plot.height=10)
#pdf("../../fig/trial.gam.classical.pdf", he=10, wi=15)
par(mfrow=c(2,3), mar=c(5.1, 5.1, 4.1, 2.1))
plot(model.classical, scale=F, scheme=1, rug=T, shade.col="steelblue2", ylab="RTinv", cex.lab=2.5, cex.axis=2)
#dev.off()

In [None]:
options(repr.plot.width=20, repr.plot.height=5)
#pdf("../../fig/trial.gam.classical.pdf", he=5, wi=20)
par(mfrow=c(1,4), mar=c(5.1, 5.1, 4.1, 2.1))
for (i in c(1,3,5,6)){
plot(model.classical, scheme=1, rug=T, shade.col="steelblue2", ylab="RTinv", 
     cex.lab=2.5, cex.axis=2, select=i, #ylim=c(-0.4, 0.25), 
     scale=F)
    abline(h=0, col="indianred")
}
#dev.off()

Run model checks

In [None]:
concurvity(model.classical)

In [None]:
gam.check(model.classical)

## Static model

We first need to normalise the `L1Chat` variable.

In [None]:
dat$L1Chat.log = log(dat$L1Chat + 0.002)
dat$L1Chat.dynamic.log = log(dat$L1Chat.dynamic + 0.002)

Now all distributions look acceptable.

In [None]:
ggpairs(dat[, c("SemanticDensity", "L1Chat.log")])

Run the model

In [None]:
model.static = gam(RTinv ~ s(order, k=200) +s(subtlex.frequency.log, by=has_frequency_fac) + has_frequency_fac + s(nletters) + 
                   s(SemanticDensity) + s(L1Chat.log) ,data=dat)

In [None]:
summary(model.static)

In [None]:
options(repr.plot.width=15, repr.plot.height=10)
#pdf("../../fig/trial.gam.static.pdf", he=10, wi=15)
par(mfrow=c(2,3), mar=c(5.1, 5.1, 4.1, 2.1))
plot(model.static, scale=F, scheme=1, rug=T, shade.col="steelblue2", ylab="RTinv", cex.lab=2.5, cex.axis=2)
#dev.off()

In [None]:
options(repr.plot.width=15, repr.plot.height=10)
pdf("../fig/trial.gam.static_bw.pdf", he=10, wi=15)
par(mfrow=c(2,3), mar=c(5.1, 5.1, 4.1, 2.1))
for (i in c(1, 3:6)) {
plot(model.static, scale=F, scheme=1, rug=T, select=i, #ylim=c(-0.45, 0.30),
     ylab="RTinv", cex.lab=2.5, cex.axis=2)
     abline(h=0)
    }
dev.off()

Some model checks:

In [None]:
concurvity(model.static)

In [None]:
gam.check(model.static)

## Dynamic simulation

Predictors look acceptable:

In [None]:
ggpairs(dat[, c("SemanticDensity.dynamic", "L1Chat.dynamic.log")])

Run the model:

In [None]:
model.dynamic = gam(RTinv ~s(order, k=200) + 
                    s(subtlex.frequency.log, by=has_frequency_fac) + has_frequency_fac + 
                    s(nletters) +
                    s(SemanticDensity.dynamic) + 
                    s(L1Chat.dynamic.log),
                    data=dat)

In [None]:
summary(model.dynamic)

Some model checks:

In [None]:
concurvity(model.dynamic)

In [None]:
gam.check(model.dynamic)

In [None]:
#options(repr.plot.width=15, repr.plot.height=10)
#pdf("../../fig/trial.gam.dynamic.pdf", he=10, wi=15)
par(mfrow=c(2,3), mar=c(5.1, 5.1, 4.1, 2.1))
for (i in c(1, 3:6)) {
plot(model.dynamic, scale=F, scheme=1, rug=T, select=i,
     #ylim=c(-0.45, 0.30),
     shade.col="steelblue2", ylab="RTinv", cex.lab=2.5, cex.axis=2)
abline(h=0, col="indianred")
    if (i==5) abline(v=median(dat$SemanticDensity.dynamic))
}
#dev.off()

## Model comparison

In [None]:
AIC(model.classical)

In [None]:
AIC(model.static)

In [None]:
AIC(model.dynamic)

## Competition between order and learning

The dynamic model captures more of the trial effect, which as a consequence contributes less to the model fit.

In [None]:
model.classical1 = gam(RTinv ~ s(subtlex.frequency.log, by=has_frequency_fac) + has_frequency_fac + 
                               s(coltheart.N.log, by=has_neighbour_fac) + has_neighbour_fac + s(nletters),
                      data=dat)

In [None]:
model.static1 = gam(RTinv ~ #s(order, k=200) +
                   s(subtlex.frequency.log, by=has_frequency_fac) + has_frequency_fac + s(nletters) + 
                   s(SemanticDensity) + s(L1Chat.log) ,data=dat)

In [None]:
model.dynamic1 = gam(RTinv ~ #s(order, k=200) + 
                    s(subtlex.frequency.log, by=has_frequency_fac) + has_frequency_fac + 
                    s(nletters) +
                    s(SemanticDensity.dynamic) + 
                    s(L1Chat.dynamic.log),
                    data=dat)

In [None]:
AIC(model.static1)-AIC(model.static)

In [None]:
AIC(model.dynamic1)-AIC(model.dynamic)

In [None]:
AIC(model.dynamic1)

In [None]:
AIC(model.static1)

In [None]:
AIC(model.classical1)

Autocorrelations?

In [None]:
par(mfrow=c(1,2))
acf(resid(model.static))
acf(resid(model.dynamic))

Not really an issue.

In [None]:
par(mfrow=c(1,2))
qqnorm(resid(model.dynamic));qqline(resid(model.dynamic))
qqnorm(resid(model.static));qqline(resid(model.static))

This could be better, but it is good enough.

# References

Keuleers, E., Diependaele, K., and Brysbaert, M. (2010). Practice effects in large-scale visual word recognition studies: A lexical decision study on 14,000 dutch mono-and disyllabic words and nonwords. Frontiers in psychology, 1:174.