# Chapter 14.4: Orthographic-Semantic-Consistency measure

Load the usual packages

In [None]:
using CSV, RCall, JudiLing, JudiLingMeasures

The dataset with OSC measures included can be downloaded from [here](https://www.marcomarelli.net/resources/osc). Store it in `../dat/English_OSC.txt`, then we can continue:

In [None]:
eng = JudiLing.load_dataset("../dat/English_OSC.txt", delim=" ");

In [None]:
first(eng, 10)

In [None]:
eng.FreqOSC = exp.(eng.log_frequency);

Load fasttext vectors for the words in `eng`:

In [None]:
engft, S = JudiLing.load_S_matrix_from_fasttext(eng, :en, target_col=:word)

In [None]:
size(engft)
size(S)

Generate cue object, train F and G matrices and predict semantic and cue matrices:

In [None]:
cue_obj = JudiLing.make_cue_matrix(engft, grams=3, target_col="word");

In [None]:
F = JudiLing.make_transform_matrix(cue_obj.C, S, engft.FreqOSC);
Shat = cue_obj.C * F;

In [None]:
G = JudiLing.make_transform_matrix(S, cue_obj.C, engft.FreqOSC);
Chat = S * G;

Compute all low-cost measures:

In [None]:
res = JudiLingMeasures.compute_all_measures_train(engft, 
                 cue_obj, Chat, S, Shat, F, G, 
                 low_cost_measures_only=true);

In [None]:
first(res, 3)

Save the result.

We now merge in reaction times from the British Lexicon Project. If you haven't done so before, download [blp-tems.txt.zip](https://osf.io/b5sdk/#!/blp-items.txt.zip), store the file in `../dat`, and unzip.    

In [None]:
@rput res;

In [None]:
R"""
blp = read.table("../dat/blp-items.txt", header=TRUE)
rt = blp$rt
names(rt)=blp$spelling
res$rt = rt[as.character(res$word)]
res = res[!is.na(res$rt),]
res$rank = unlist(res$rank)
res$SemanticSupportForForm = unlist(res$SemanticSupportForForm)
res$Support = unlist(res$Support)
res$RTinv = -1000/res$rt
res$WordLength = nchar(res$word)
write.csv(res, file="../res/engft_measures.csv")
""";

Inspect selected measures and RTs in pairsplot:

In [None]:
R"""
install.packages('languageR')
library(languageR)
pairscor.fnc(res[,c("RTinv", "OSC", "TargetCorrelation", "SemanticSupportForForm", "SemanticDensity", "NNC")])
"""

Assess effect of OSC and three DLM-based predictors on reaction times:

In [None]:
R"""
library(mgcv)
res.gam0 = gam(RTinv ~ s(OSC) + s(WordLength) +
                          s(TargetCorrelation) + 
                          s(NNC),
              data=res);
summary(res.gam0)
"""

In [None]:
R"""
plot(res.gam0, pages=1);
"""

The wiggly effect for OSC suggest overfitting, we set k to 4.

In [None]:
R"""
res.gam1 = gam(RTinv ~ s(OSC, k=4) + s(WordLength) +
                          s(TargetCorrelation) + 
                          s(NNC),
              data=res);
summary(res.gam1)
"""

Compute contribution to AIC for each predictor:

In [None]:
R"""
res.no.OSC.gam       = gam(RTinv ~ s(WordLength) +
                                       s(TargetCorrelation) + s(NNC),
                           data=res);
res.no.length.gam    = gam(RTinv ~ s(OSC, k=4) + 
                                       s(TargetCorrelation) + s(NNC),
                           data=res);
res.no.TargetCor.gam = gam(RTinv ~ s(OSC, k=4) +  s(WordLength)+
                                       s(NNC),
                           data=res);
res.no.NNC.gam       = gam(RTinv ~ s(OSC, k=4) +  s(WordLength)+
                                       s(TargetCorrelation),
                           data=res);
"""

In [None]:
R"""
aics = AIC(res.gam1, res.no.OSC.gam, res.no.length.gam, res.no.TargetCor.gam, res.no.NNC.gam)
aics = aics[order(aics$AIC),]
aics$dAIC = aics$AIC-aics$AIC[1]
aics
"""

Visualise effects:

In [None]:
R"""
par(mfrow=c(2,2))
plot(res.gam1, select=1, scheme=1, shade.col="steelblue2",
 ylab="partial effect RTinv") 
mtext("d(AIC) = 125.7", 3, 1.4)
abline(h=0, col="indianred")
plot(res.gam1, select=2, scheme=1, shade.col="steelblue2",
  xlab="Word Length", ylab="partial effect RTinv") 
mtext("d(AIC) = 957.7", 3, 1.4)
abline(h=0, col="indianred")
plot(res.gam1, select=3, scheme=1, shade.col="steelblue2",
 ylab="partial effect RTinv") 
abline(h=0, col="indianred")
mtext("d(AIC) = 3828.8", 3, 1.4)
plot(res.gam1, select=4, scheme=1, shade.col="steelblue2",
 ylab="partial effect RTinv") 
mtext("d(AIC) = 4.8", 3, 1.4)
abline(h=0, col="indianred")
"""

In [None]:
R"""
pdf("../fig/osc_partial_effects.pdf", he=6, wi=6)
par(mfrow=c(2,2), oma=rep(0,4), mar=c(5,4,3,1))
plot(res.gam1, select=1, scheme=1, shade.col="steelblue2",
 ylab="partial effect RTinv") 
mtext("d(AIC) = 125.7", 3, 1.4)
abline(h=0, col="indianred")
plot(res.gam1, select=2, scheme=1, shade.col="steelblue2",
  xlab="Word Length", ylab="partial effect RTinv") 
mtext("d(AIC) = 957.7", 3, 1.4)
abline(h=0, col="indianred")
plot(res.gam1, select=3, scheme=1, shade.col="steelblue2",
 ylab="partial effect RTinv") 
abline(h=0, col="indianred")
mtext("d(AIC) = 3828.8", 3, 1.4)
plot(res.gam1, select=4, scheme=1, shade.col="steelblue2",
 ylab="partial effect RTinv") 
mtext("d(AIC) = 4.8", 3, 1.4)
abline(h=0, col="indianred")
dev.off()
"""