# Chapter 14.6: Comparing EL, FIL, DDL and FIDDL

## Preparation

Load the necessary packages:

In [None]:
using Flux
using JudiLing, DataFrames, JudiLingMeasures, RCall

Prepare the data (DLP data from Keuleers et al, 2010, can be found [here](https://osf.io/uw7t6/) if you haven't downloaded it previously).

In [None]:
dlp_items = JudiLing.load_dataset("../dat/dlp-items.txt", delim="\t", missingstring="NA")
dlp_stimuli = JudiLing.load_dataset("../dat/dlp-stimuli.txt", delim="\t", missingstring="NA")

dlp = leftjoin(dlp_items, dlp_stimuli, on=:spelling)

words = dlp[dlp.lexicality .== "W",:]

# fill in NAs in celex frequency with zeros and make sure they are all integers
words[ismissing.(words."celex.frequency"),"celex.frequency"] .= 0
words."celex.frequency" = Int.(words."celex.frequency")

# scale down the frequencies to make sure this code runs in a reasonable time frame for demonstration purposes
words."celex.frequency.scaled" = Int.(ceil.(words."celex.frequency"/100))

Prepare C and S matrices:

In [None]:
words, S = JudiLing.load_S_matrix_from_fasttext(words, :nl, target_col="spelling")

In [None]:
cue_obj = JudiLing.make_cue_matrix(words, target_col="spelling", grams=3);

## Compute the various mappings

### Endstate of Learning (EL)

In [None]:
F = JudiLing.make_transform_matrix(cue_obj.C, S)
Shat_el = cue_obj.C * F

Compute target correlation:

In [None]:
words[!, "TargetCorrelationEL"] = JudiLingMeasures.target_correlation(Shat_el, S) 

### Frequency-informed Learning (FIL)

In [None]:
F_fil = JudiLing.make_transform_matrix(cue_obj.C, S, words."celex.frequency.scaled")
Shat_fil = cue_obj.C * F_fil

In [None]:
words[!, "TargetCorrelationFIL"] = JudiLingMeasures.target_correlation(Shat_fil, S) 

### Deep Discriminative Learning (DDL)

Train a model using default parameters:

In [None]:
res = JudiLing.get_and_train_model(cue_obj.C, 
                    S,
                    "../res/dlp_ddl_model.bson", 
                    batchsize=512)

Predict semantic matrix and compute target correlations:

In [None]:
Shat_ddl = JudiLing.predict_from_deep_model(res.model, cue_obj.C)
words[!, "TargetCorrelationDDL"] = JudiLingMeasures.target_correlation(Shat_ddl, S) 

### Frequency-informed linear discriminative learning (FIDDL)

In [None]:
# generate a learning sequence based on words' frequencies:
learn_seq = JudiLing.make_learn_seq(words."celex.frequency.scaled", random_seed=0422)

Run the `fiddl` function:

In [None]:
res = JudiLing.fiddl(cue_obj.C, 
                    S, 
                    learn_seq, 
                    words, 
                    "spelling", 
                    "../res/dlp_fiddl_model.bson", 
                    batchsize=512, 
                    n_batch_eval=1000)

Predict semantic matrix and compute target correlations:

In [None]:
Shat_fiddl = JudiLing.predict_from_deep_model(res.model, cue_obj.C)

In [None]:
words[!, "TargetCorrelationFIDDL"] = JudiLingMeasures.target_correlation(Shat_fiddl, S) 

## Statistical analysis

Move to R and run GAMs with target correlation from the three models respectively:

In [None]:
@rput words;

In [None]:
R"""
library(mgcv)
library(itsadug)

words$RTinv = -1000/words$rt

""";

In [None]:
R"""
gm_el = gam(RTinv ~ s(TargetCorrelationEL), data=words)

print(AIC(gm_el))
summary(gm_el)
"""

In [None]:
R"""
gm_fil = gam(RTinv ~ s(TargetCorrelationFIL), data=words)

print(AIC(gm_fil))
summary(gm_fil)
"""

In [None]:
R"""
gm_ddl = gam(RTinv ~ s(TargetCorrelationDDL), data=words)

print(AIC(gm_ddl))
summary(gm_ddl)
"""

In [None]:
R"""
gm_fiddl = gam(RTinv ~ s(TargetCorrelationFIDDL), data=words)

print(AIC(gm_fiddl))
summary(gm_fiddl)
"""

The model with FIDDL ends up with the lowest AIC.

## Development of AICs over time

Define a function for computing target correlations:

In [None]:
function compute_target_corr(X_train, Y_train, X_val, Y_val,
                                    Yhat_train, Yhat_val, data_train,
                                    data_val, target_col, model, epoch)
    data_train[!, string("target_corr_", epoch)] = JudiLingMeasures.target_correlation(Yhat_train, Y_train)
    return(data_train, data_val)
end

Train a model while computing target correlations after each epoch:

In [None]:
res_ddl_eval = JudiLing.get_and_train_model(cue_obj.C, 
                    S,
                    "../res/dlp_ddl_model2.bson",
                    data_train=words, 
                    target_col="spelling", 
                    batchsize=512,
                    measures_func = compute_target_corr,
                    return_train_acc=true) # set return_train_acc to true so that we can inspect training accuracies

Inspect a few of the new columns in the training data:

In [None]:
res_ddl_eval.data_train[1:5, [:spelling, :target_corr_1, :target_corr_50, :target_corr_100]]

Move the dataframe to R:

In [None]:
ddl_data = res_ddl_eval.data_train
@rput ddl_data

Compute AICs for the target correlations across epochs:

In [None]:
R"""
ddl_data$RTinv = -1000/ddl_data$rt

aics = c()
for (i in 1:100){

    f = formula(paste0("RTinv ~ s(target_corr_", i, ")"))
    gm = gam(f, data=ddl_data)
    aics = c(aics, AIC(gm))
}
"""

Plot AICs and accuracies:

In [None]:
@rget aics;

In [None]:
using Plots
default(fmt=:jpg)

In [None]:
scatter(aics, label=false, xlab="epoch", ylab="AIC", size=(400,300))

In [None]:
savefig("../fig/ddl_aic_dev.pdf")

In [None]:
scatter(res_ddl_eval.accs_train, label=false, xlab="epoch", ylab="Correlation accuracy", size=(400,300))

In [None]:
savefig("../fig/ddl_acc_dev.pdf")

## Exercises

### Exercise 1

Comparing incremental learning to the other models:

Train a linear model with incremental learning, re-using the learning sequence generated for FIDDL. We leave the learning rate at its default value.

In [None]:
F_whl = JudiLing.wh_learn(cue_obj.C, S, learn_seq=learn_seq, n_epochs=1, verbose=true)

Predict semantic matrix and compute target correlations.

In [None]:
Shat_whl = cue_obj.C * F_whl

In [None]:
words[!, "TargetCorrelationWHL"] = JudiLingMeasures.target_correlation(Shat_whl, S) 

Move to R and compute AIC:

In [None]:
@rput words;
R"""
words$RTinv = -1000/words$rt

gm_whl = gam(RTinv ~ s(TargetCorrelationWHL), data=words)

print(AIC(gm_whl))
summary(gm_whl)
"""

The AIC is lower than with EL and higher than FIDDL. Surprisingly, it is also lower than with FIL. Why could this be the case?

Let's first inspect the accuracies of the two models:

In [None]:
JudiLing.eval_SC(Shat_whl, S)

In [None]:
JudiLing.eval_SC(Shat_fil, S)

WHL shows higher accuracy than FIL. How do the target correlations computed by the two models compare to each other?

In [None]:
scatter(words.TargetCorrelationFIL, words.TargetCorrelationWHL, xlab="Target correlation FIL", ylab="Target correlation WHL",
label=false)
plot!([0,1], [0,1], linewidth=2, label="x=y")

Comparison to the x=y line shows that FIL overestimates target correlations for lower values of target correlations in WHL and underestimates them for higher values in WHL. Let's now inspect the effects the two target correlations have on reaction times:

In [None]:
R"""
plot(gm_fil)
""";

In [None]:
R"""
plot(gm_whl)
""";

This suggests that the overestimates for target correlation for lower values in FIL may lead to less predictability for reaction times compared to WHL, as indicated by the flatter line for lower values of target correlation in the FIL model.

### Exercise 2

Target correlations across time with FIDDL:

Adapt the `compute_target_corr` function to run with the `fiddl` function:

In [None]:
function compute_target_corr2(X, Y, Yhat, data, target_col, model, step)
    data[!, string("target_corr_", step)] = JudiLingMeasures.target_correlation(Yhat, Y)
    return(data)
end

Run fiddl while supplying `compute_target_corr2`:

In [None]:
res_fiddl_target_corr = JudiLing.fiddl(cue_obj.C, 
                    S, 
                    learn_seq, 
                    words, 
                    "spelling", 
                    "../res/dlp_fiddl_model2.bson", 
                    batchsize=512, 
                    n_batch_eval=10,
                    measures_func=compute_target_corr2)

In [None]:
res_fiddl_target_corr.data[1:5, 49:end]

Move to R for the statistical analysis:

In [None]:
res_fiddl_target_corr_data = res_fiddl_target_corr.data
@rput res_fiddl_target_corr_data;

In [None]:
R"""
head(res_fiddl_target_corr_data[,c(49:(49+66))])
"""

Run a GAM for target correlations:

In [None]:
R"""
res_fiddl_target_corr_data$RTinv = -1000/res_fiddl_target_corr_data$rt

aics_fiddl = c()
for (col in colnames(res_fiddl_target_corr_data[c(49:(49+66))])){

    f = formula(paste0("RTinv ~ s(", col, ")"))
    gm = gam(f, data=res_fiddl_target_corr_data)
    aics_fiddl = c(aics_fiddl, AIC(gm))
}
"""

In [None]:
@rget aics_fiddl;

In [None]:
scatter(aics_fiddl, label=false, xlab="epoch", ylab="AIC", size=(400,300))

In the beginning we see a similar uptick as in the DDL model where AIC initially gets higher with higher accuracy, but then very quickly AIC goes down significantly the longer the model is trained.

# References

Keuleers, E., Diependaele, K., and Brysbaert, M. (2010). Practice effects in large-scale visual word recognition studies: A lexical decision study on 14,000 dutch mono-and disyllabic words and nonwords. Frontiers in psychology, 1:174.