Skip to content

jadeleiyu/word_sense_extension

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Word Sense Extension

This is the GitHub repository containing code implementations and data for the following ACL 2023 paper:

Yu, L. and Xu, Y. (2023) Word sense extension. In Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics.

Data

See the 8 .csv files under the /data/ directory for word usage data used for WSE and WSD experiments. These dataframes can be generated by running the code under the /code/data_prep/ directory.

Code for data preparations

Python files for preparing the usage data are located under the /code/data_prep/ directory:

-- preprocess.py: code for preprocessing the wikitext-103 corpus using SpaCy.

-- wsd.py: code for annotating word sense labels the preprocessed wikitext-103 corpus through automated WSD models.

-- sense_usage_data_prep.py: code for preparing usage dataframes for WSE and WSD experiments.

-- cse_data_prep.py: code for preparing datasets for sense extensional space learning.

Code for word sense extension (WSE) experiments

Python files for WSE experiments are located under the /code/wse/ directory:

-- pretrain_mlm.py: code for pretraining BERT-based language models from scratch on usage data of partitioned tokens.

-- wse_train.py: code for learning sense-extensional semantic space and evaluating the learned WSE models.

Code for word sense disambiguation (WSD) experiments

Python files for WSD experiments are located under the /code/wsd/ directory:

-- wsd_models.py: implementations of the original BEM WSD model in (Blevins et al., 2020).

-- wsd_main.py: code for training and evaluating BEMs with and without sense-extensional space learning on the standard WSD evaluation framework by (Raganato et al., 2017).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages