This is the GitHub repository containing code implementations and data for the following ACL 2023 paper:
Yu, L. and Xu, Y. (2023) Word sense extension. In Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics.
See the 8 .csv files under the /data/
directory for word usage data used for WSE and WSD experiments. These dataframes can be generated by running the code under the /code/data_prep/
directory.
Python files for preparing the usage data are located under the /code/data_prep/
directory:
-- preprocess.py
: code for preprocessing the wikitext-103 corpus using SpaCy.
-- wsd.py
: code for annotating word sense labels the preprocessed wikitext-103 corpus through automated WSD models.
-- sense_usage_data_prep.py
: code for preparing usage dataframes for WSE and WSD experiments.
-- cse_data_prep.py
: code for preparing datasets for sense extensional space learning.
Python files for WSE experiments are located under the /code/wse/
directory:
-- pretrain_mlm.py
: code for pretraining BERT-based language models from scratch on usage data of partitioned tokens.
-- wse_train.py
: code for learning sense-extensional semantic space and evaluating the learned WSE models.
Python files for WSD experiments are located under the /code/wsd/
directory:
-- wsd_models.py
: implementations of the original BEM WSD model in (Blevins et al., 2020).
-- wsd_main.py
: code for training and evaluating BEMs with and without sense-extensional space learning on the standard WSD evaluation framework by (Raganato et al., 2017).