# Chapter 14.5: Trial-to-trial learning in Dutch (Simulation in Julia)

Importing packages:

In [None]:
using JudiLing, JudiLingMeasures, CSV, DataFrames, ProgressMeter

## Data preparation

First, if you haven't done so before, download the trial-level data from the Dutch Lexicon Project (Keuleers et al, 2010) from [here](https://osf.io/uw7t6/) and store it as `dlp-trials.txt` in the `dat` directory.

Now we load the file into Julia

In [None]:
dlp = JudiLing.load_dataset("../dat/dlp-trials.txt", delim="\t");

We subset the file to only include responses of the first participant:

In [None]:
dlp_part1 = dlp[dlp.participant .== 1,:];

In [None]:
first(dlp_part1, 5)

Divide the data into words and nonwords:

In [None]:
dlp_words = dlp_part1[dlp_part1.lexicality .== "W",:]
dlp_nonwords = dlp_part1[dlp_part1.lexicality .== "N",:]

Sort each by the order they were presented to the participant:

In [None]:
dlp_words = sort(dlp_words, [:order])
dlp_nonwords = sort(dlp_nonwords, [:order])

## Initialising the model

Load an S matrix using fasttext vectors:

In [None]:
dlp_words_small, S = JudiLing.load_S_matrix_from_fasttext(dlp_words, :nl, target_col=:spelling);

Create cue objects for the words and nonwords respectively

In [None]:
cue_obj_words, cue_obj_nonwords = JudiLing.make_combined_cue_matrix(dlp_words_small, dlp_nonwords,
                                                            grams=3, target_col=:spelling);

Calculate F and G mappings for the words:

In [None]:
F = JudiLing.make_transform_matrix(cue_obj_words.C, S)
G = JudiLing.make_transform_matrix(S, cue_obj_words.C)

Now, we create the target semantic vectors for the nonwords. For this, we first project the nonword form vectors into semantic space:

In [None]:
S_nonwords = cue_obj_nonwords.C * F

Next, we add the semantic vectors of "niet" and "woord" to the predicted vectors:

In [None]:
niet_vec = S[dlp_words_small.spelling .== "niet",:]
woord_vec = S[dlp_words_small.spelling .== "woord",:]

In [None]:
for i in 1:size(S_nonwords, 1)
    S_nonwords[i, :] = S_nonwords[i, :] + vec(niet_vec) + vec(woord_vec)
end

## Simulating the experiment

First, we need to create a dataframe with all words and nonwords, as well as an S matrix with the target vectors for both:

In [None]:
dlp_part1_final = vcat(dlp_words_small, dlp_nonwords)
S_part1 = vcat(S, S_nonwords)

We sort both by the order they were presented to the participant:

In [None]:
S_part1_ordered = S_part1[sortperm(dlp_part1_final.order), :]
dlp_part1_ordered = sort(dlp_part1_final, [:order])

Now create a cue object for the combined dataset. Provide `cue_obj_words` to the function, so that the `i2f` and `f2i` matrices are reused for creating the new C matrix:

In [None]:
cue_obj = JudiLing.make_cue_matrix(dlp_part1_ordered, cue_obj_words, grams=3, target_col=:spelling)

Now we "run" the static simulation. Since F and G do not change throughout the course of the experiment, we can simply map the C and Shat matrices using F and G in the usual way.

In [None]:
Shat_collection_static = cue_obj.C * F
Chat_collection_static = Shat_collection_static * G

Extract measures from the static simulation. For simplicity, we restrict ourselves to words only.

In [None]:
measures_static = deepcopy(dlp_words_small)
acc_comp, cor_s = JudiLing.eval_SC(Shat_collection_static[dlp_part1_ordered.lexicality .== "W",:], S, R=true)
measures_static[!,"SemanticDensity"] = JudiLingMeasures.density(cor_s, n=8)
measures_static[!,"L1Chat"] = JudiLingMeasures.L1Norm(Chat_collection_static[dlp_part1_ordered.lexicality .== "W",:])

For the dynamic simulation, we loop over all trials, first compute the $\hat{s}$ and $\hat{c}$ vectors for the trial and save them. Then the mappings F and G are updated to decrease the error between the form and meaning of the currently presented stimulus.

In [None]:
Shat_collection_dynamic = zeros(size(S_part1_ordered))
Chat_collection_dynamic = zeros(size(cue_obj.C))

@showprogress for trial in 1:size(dlp_part1_ordered, 1)
    shat = cue_obj.C[trial:trial, :] * F
    chat = shat * G
    Shat_collection_dynamic[trial:trial,:] = shat
    Chat_collection_dynamic[trial:trial,:] = chat
    
    F = JudiLing.wh_learn(cue_obj.C[trial:trial, :], S_part1_ordered[trial:trial, :], eta=0.001, weights = F,
                          n_epochs=1)
    G = JudiLing.wh_learn(S_part1_ordered[trial:trial, :], cue_obj.C[trial:trial, :], eta=0.001, weights = G,
                          n_epochs=1)
end

Extract dynamic measures.

In [None]:
measures_dynamic = deepcopy(dlp_words_small)
acc_comp, cor_s = JudiLing.eval_SC(Shat_collection_dynamic[dlp_part1_ordered.lexicality .== "W",:], S, R=true)
measures_dynamic[!,"SemanticDensity"] = JudiLingMeasures.density(cor_s, n=8)
measures_dynamic[!,"L1Chat"] = JudiLingMeasures.L1Norm(Chat_collection_dynamic[dlp_part1_ordered.lexicality .== "W",:])

Save measures.

In [None]:
CSV.write("../res/dlp-trial-measures-static.csv", measures_static)

In [None]:
CSV.write("../res/dlp-trial-measures-dynamic.csv", measures_dynamic)

# References

Keuleers, E., Diependaele, K., and Brysbaert, M. (2010). Practice effects in large-scale visual word recognition studies: A lexical decision study on 14,000 dutch mono-and disyllabic words and nonwords. Frontiers in psychology, 1:174.