A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change

This is the official repository for our paper A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change

Abstract

Contextualized embeddings are the preferred tool for modeling Lexical Semantic Change (LSC). Current evaluations typically focus on a specific task known as Graded Change Detection (GCD). However, performance comparison across work are often misleading due to their reliance on diverse settings. In this paper, we evaluate state-of-the-art models and approaches for GCD under equal conditions. We further break the LSC problem into Word-in-Context (WiC) and Word Sense Induction (WSI) tasks, and compare models across these different levels. Our evaluation is performed across different languages on eight available benchmarks for LSC, and shows that (i) APD outperforms other approaches for GCD; (ii) XL-LEXEME outperforms other contextualized models for WiC, WSI, and GCD, while being comparable to GPT-4; (iii) there is a clear need for improving the modeling of word meanings, as well as focus on how, when, and why these meanings change, rather than solely focusing on the extent of semantic change.

Getting Started

Before you begin, ensure you have met the following requirements:

Python 3.10.4
Required Python packages (listed in requirements.txt)

To install the required packages, you can use pip:

pip install -r requirements.txt

Reproducing Results

Reproducing all experiments at once is possible using our comprehensive script, run_experiments.sh. However, please be aware that this may consume a significant amount of time and could be interrupted if your system lacks sufficient GPU resources or if you don't have an OpenAI account. Therefore, we highly recommend executing each command line by line to maintain full control over the experiments.

You find our code in the src folder. Feel free to contact us if you face any issues!

As GPT-4 is currently an expensive model, we provide our data for the English benchmark in the chatgpt_annotation directory. Please note that the data is in a raw format suitable for testing our experiments and making comparisons. If you intend to use the data, you might need to process it accordingly.

For those interested in annotating data in languages other than English, input your OpenAI token into the chatgpt_api.sh file.

If you choose to leverage our data, please remember to mention our work :)

Evaluation Results

We share our evaluation result in folder stats. The folder contains three file, namely layer-stats.tsv, layer-combination-stats.tsv, and wic-wsi-gcd-stats.tsv.

layer-stats.tsv contains the results of Table 1 and 4 of our paper.
layer-combination-stats.tsv contains the results of Figure 2 and 3 of our paper.
wic-wsi-gcd-stats.tsv contains the results of Table 2 of our paper.

References

@Misc{periti2024systematic,
      title = {{A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change}}, 
      author = {Francesco Periti and Nina Tahmasebi},
      year = {2024},
      eprint = {2402.12011},
      archivePrefix = {arXiv},
      primaryClass = {cs.CL},
      url = {https://arxiv.org/abs/2402.12011}
}

@Misc{montanelli2023survey,
      title = {{A Survey on Contextualised Semantic Shift Detection}}, 
      author = {Stefano Montanelli and Francesco Periti},
      year = {2023},
      eprint = {2304.01666},
      archivePrefix = {arXiv},
      primaryClass = {cs.CL},
      url = {https://arxiv.org/abs/2304.01666}
}

@InBook{tahmasebi2021survey,
    author = {Tahmasebi, Nina  and Borin, Lars and Jatowt, Adam},
    title = {{Survey of Computational Approaches to Lexical Semantic Change Detection}},
    booktitle = {Computational approaches to semantic change},
    pages = {1--91},
    publisher = {Language Science Press},
    year = {2021},
    address = {Berlin},
    doi = {10.5281/zenodo.5040302}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
src		src
stats		stats
README.md		README.md
chatgpt_annotation.zip		chatgpt_annotation.zip
chatgpt_api.txt		chatgpt_api.txt
requirements.txt		requirements.txt
run_comparison.sh		run_comparison.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

stats

stats

README.md

README.md

chatgpt_annotation.zip

chatgpt_annotation.zip

chatgpt_api.txt

chatgpt_api.txt

requirements.txt

requirements.txt

run_comparison.sh

run_comparison.sh

Repository files navigation

A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change

Table of Contents

Abstract

Getting Started

Reproducing Results

Evaluation Results

References

About

Releases

Packages

Languages

FrancescoPeriti/CSSDetection

Folders and files

Latest commit

History

Repository files navigation

A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change

Table of Contents

Abstract

Getting Started

Reproducing Results

Evaluation Results

References

About

Resources

Stars

Watchers

Forks

Languages