Interpretable Word Sense Representations via Definition Generation

This repository accompanies the paper Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis (ACL'2023) by Mario Giulianelli, Iris Luden, Raquel Fernández and Andrey Kutuzov.

The project is a collaboration between the Dialogue Modelling Group at the University of Amsterdam and the Language Technology Group at the University of Oslo.

Definition generation models for English:

Usage

Download datasets

wordnet and oxford

*.txt files are tsv files containing the target words and their gold standard definitions.

*.eg files are tsv files containing the target words and their usage examples.

CoDWoE

Predict definitions

Gzip test.txt and test.eg files, put them into the same folder and run code/modeling/generate_t5.py, e.g.

python3 code/modeling/generate_t5.py --model ltg/flan-t5-definition-en-base --testdata testdata

Generate DistilRoBERTa sentence embeddings for the definitions

code/modeling/generate_t5.py outputs a tsv file named as _post_predicted.tsv. The gold standard definitions are in the Definition column, and the predicted ones are in the Definitions column. Run code/embed_definitions.py, e.g.

python3 code/embed_definitions.py --input_path "what_is_the_definition_of_<trg>?_post_predicted.tsv" --key_to_entry_id Sense

key_to_entry_id depends on the dataset used. Sense is used in wordnet

Citation

@inproceedings{giulianelli-etal-2023-interpretable,
    title = "Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis",
    author = "Giulianelli, Mario  and
      Luden, Iris  and
      Fernandez, Raquel  and
      Kutuzov, Andrey",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.176",
    doi = "10.18653/v1/2023.acl-long.176",
    pages = "3130--3148",
    abstract = "We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations. Given a collection of usage examples for a target word, and the corresponding data-driven usage clusters (i.e., word senses), a definition is generated for each usage with a specialised Flan-T5 language model, and the most prototypical definition in a usage cluster is chosen as the sense label. We demonstrate how the resulting sense labels can make existing approaches to semantic change analysis more interpretable, and how they can allow users {---} historical linguists, lexicographers, or social scientists {---} to explore and intuitively explain diachronic trajectories of word meaning. Semantic change analysis is only one of many possible applications of the {`}definitions as representations{'} paradigm. Beyond being human-readable, contextualised definitions also outperform token or usage sentence embeddings in word-in-context semantic similarity judgements, making them a new promising type of lexical representation for NLP.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
code		code
plots		plots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cuda_requirements.txt		cuda_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interpretable Word Sense Representations via Definition Generation

Definition generation models for English:

Usage

Download datasets

Predict definitions

Generate DistilRoBERTa sentence embeddings for the definitions

Citation

About

Releases

Packages

Contributors 3

Languages

License

ltgoslo/definition_modeling

Folders and files

Latest commit

History

Repository files navigation

Interpretable Word Sense Representations via Definition Generation

Definition generation models for English:

Usage

Download datasets

Predict definitions

Generate DistilRoBERTa sentence embeddings for the definitions

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages