This repository contains the R scripts and data files used to produce the results reported in Zafar and Nicholls (2023) "An Embedded Diachronic Sense Change Model with a Case Study from Ancient Greek".
Data extraction and snippets: The ancient Greek data used to extract snippets for the target words "kosmos", "mus" and "harmonia" is freely available online. Links included in the file "Greek data extraction.R" point to the data sources, and the code may be used to extract the snippets. Alternatively, the files "kosmos_snippets.RData", "mus_snippets.RData" and "harmonia_snippets.RData" contain the extracted snippets ready to use.
The English data used to extract snippets for the target word "bank" needs to be purchased from https://www.english-corpora.org/coha/. (Note that the COHA data used in this paper was the version as at 3 June 2020.) The file "COHA data extraction.R" may be used to extract the snippets once the data has been obtained. With permission from the data publisher, the extracted snippets are included in the file "bank_snippets.words.RData", as well as our manual sense annotation for these snippets, ready to use.
Embeddings: The files "Embeddings - Greek data.R" and "Embeddings - COHA.R" can be used to generate the GloVe embedding vectors for the ancient Greek and COHA data respectively. Alternatively, "word vectors" contains the embedding vectors for all context words (after data filtering) for the four target words ready to use.
Models and samplers: R scripts used to fit the models using HMC and MALA (not Stan), and using NUTS and ADVI (Stan).
Scalability analysis: R scripts used to simulate the data and perform the runs used in Figure 9 in the paper.
figures and tables.R: Contains the code used to produce all the figures, tables and results reported in the paper.