# Chapter 15.1: Morphological decomposition? 

In this notebook we model a priming study reported by Creemers et al. (2020) with LDL-EL, replicating Chuang, Kang, Luo and Baayen (2022).

In [None]:
import Pkg; Pkg.add("StatsPlots")
using JudiLing, JudiLingMeasures, CSV, DataFrames, StatsPlots, Statistics, RCall

Input for modeling was a set of 7803 Dutch words, selected from the lemma database in CELEX
(Baayen et al., 1995). For a word to be included, it had to have a frequency of occurrence exceeding
100 (per 42 million) and have at most two constituents, or it had to be listed as a prime or target
in Experiment 1 of Creemers et al. (2020).  The dataset loaded in the next code snippet contains four additional words.

In [None]:
phon = JudiLing.load_dataset("../dat/dutch_phon.csv");

In [None]:
size(phon)

In [None]:
first(phon, 6)

Load the S matrix.

In [None]:
S, words = JudiLing.load_S_matrix("../dat/dutch_phon_ft.txt", header = false, sep = " ");

Create the cue object.

In [None]:
cue_obj = JudiLing.make_cue_matrix(phon, grams=3, target_col=:phon);

In [None]:
size(cue_obj.C)

Compute the F and G matrices.

In [None]:
F = JudiLing.make_transform_matrix(cue_obj.C, S);

In [None]:
G = JudiLing.make_transform_matrix(S, cue_obj.C);

Predict $\hat{\mathbf{S}}$ and $\hat{\mathbf{C}}$.

In [None]:
Shat = cue_obj.C * F;
Chat = S * G;

Compute accuracy and correlation matrix.

In [None]:
accuracy, R = JudiLing.eval_SC(Shat, S, phon, :phon, R=true);
accuracy

Compute measures with JudiLingMeasures and save.

In [None]:
measures = JudiLingMeasures.compute_all_measures_train(phon, 
                                                       cue_obj, 
                                                       Chat, 
                                                       S, 
                                                       Shat, 
                                                       F, 
                                                       G, 
                                                       low_cost_measures_only=true);

In [None]:
CSV.write("../res/dutch_phon_EOL_measures.csv", measures);

In [None]:
CSV.write("../res/dutch_phon_EOL_Shat.csv", DataFrame(Shat, :auto));

Load the stimuli used in Creemer's experiment.

In [None]:
stim = JudiLing.load_dataset("../dat/creemers_stimuli.csv")

Remove word missing from the training data. Extract the target words, and the priming types

In [None]:
stim = stim[stim.Word .!= "omfietsen", :] # missing in the training data
targets = stim[stim.Type .== "target","Word"];
types = ["ms", "m", "ph", "c"]

Get the primes for the targets, together with prime type and correlation between prime and target

In [None]:
res = []
for target in targets
    set = stim[stim.Word .== target, "Set"] # get the set number for the target word
    set = stim[stim.Set .== set, :] # get all rows with the same set number
    for ty in types
        prime = set[set.Type .== ty, "Word"]
        if length(prime) > 0
            corr = vec(R[phon.orth .== prime, phon.orth .== target])
            append!(res, [(target, prime[1], ty, corr[1])])
        end
    end
end
res = DataFrame(res, ["target", "prime", "prime_type", "r"])

Count number of rows per prime type.

In [None]:
combine(groupby(res, "prime_type"), nrow)

Plot.

In [None]:
boxplot(res.prime_type, res.r, group=res.prime_type, legend=false, title="prime-target correlation")

Get the experimental data and compute mean log rts.

In [None]:
expdata = JudiLing.load_dataset("../dat/expdata1_cleaned.csv")
expdata[!, "combi"] = string.(expdata.prime, "_", expdata.target, )
means = combine(groupby(expdata, :combi), "target_rt.log" => mean => "target_rt.log.mean")

Add the mean log RTs to the dataframe with the LDL correlations.

In [None]:
res[!, "prime_target"] = string.(res.prime, "_", res.target)
res[!, "MeanLogRT"] = [means[means.combi .== c, "target_rt.log.mean"][1] for c in res.prime_target]

Put dataframe to R for computing Pearson correlation.

In [None]:
@rput res
R"""
cor.test(res$r, res$MeanLogRT)
"""

Plot mean log RT and correlation box plots side-by-side.

In [None]:
p1 = boxplot(res.prime_type, res.MeanLogRT, group=res.prime_type, legend=false, title="mean RT")
p2 = boxplot(res.prime_type, res.r, group=res.prime_type, legend=false, title="prime-target correlation")
plot(p1, p2, size=(600,300))

In [None]:
savefig("../fig/creemers_boxplots_eol.pdf")

Compute correlations between prime embedding and target embedding.

In [None]:
res[!, "rft"] = [cor(S[phon.orth .== row.prime, :], S[phon.orth .== row.target,:], dims=2)[1] for row in eachrow(res)];

Plot.

In [None]:
boxplot(res.prime_type, res.rft, group = res.prime_type, legend=false, 
    title="mean correlation of prime\nand target embeddings", size=(300,300))

In [None]:
savefig("../fig/creemers_boxplots_fasttext.pdf")

# Exercises

Please see solutions for exercises in the following notebooks.

# References

Chuang, Y.-Y., Kang, M., Luo, X., and Baayen, R. H. (2022a). Vector space morphology with linear discriminative learning. In Crepaldi, D., editor, Linguistic morphology in the mind and brain.

Creemers, A., Davies, A. G., Wilder, R. J., Tamminga, M., and Embick, D. (2020). Opacity, transparency, and morphological priming: A study of prefixed verbs in dutch. Journal of Memory and Language, 110:104055.