Skip to content

Living-with-machines/TargetedSenseDisambiguation

Repository files navigation

When Time Makes Sense:
A Historically-Aware Approach to Targeted Sense Disambiguation

License

This repository provides underlying code and materials for the paper When Time Makes Sense: A Historically-Aware Approach to Targeted Sense Disambiguation.

Table of contents

Installation

We strongly recommend installation via Anaconda:

conda create -n py37_tsd python=3.7
  • Activate the environment:
conda activate py37_tsd
  • Install dependencies:
cd /path/to/my/TargetedSenseDisambiguation
pip install -r requirements.txt

Also, we use a spaCy model: en_core_web_lg which can be installed:

python -m spacy download en_core_web_lg

Code

This section explains how to run the code. For most of scripts you'd need credentials for the Oxford Historical Dictionary Research API. These scripts are marked by \*\*. More information on obtaining access to the API can be found here.

[WARNING] Results produced by this notebook may slightly differ from those in the paper, this is because:

  • the source data (the quotations stored in the OED) may change over time
  • the order is which data is retrieved and stored changes with each run, reulting in the different splits for train, validation and test. Please contact the author

However, the authors have rerun the pipeline multiple times and scores produced by these scritps are close to the ones reported in the paper and don't affect the conclusions drawn from the experiments.

The only deviation may be results for the curated experiments, which tend to be more volatile.

Generate Dataframe

This script generate_dataframes.py downloads data from the API for a given headword and vectorizes the keyword of the quotations.

[WARNING] This script requires access to the historical BERT models, available on Zenodo. Please copy bert_1760_1850 and bert_1760_1900 models to the models folder and adjust the paths in lines 7-8.

[WARNING] To download the data you need access to the OED API, more information on how to obtain credentials is available here. Once you have the credentials, add them to oed_credentials.json.

python generate_dataframes.py

All results should be saved in the /data folder. Almost all next steps require these data as input.

Running Experiments

Comparing BERT models

The code snippet below runs the main experiment that tests the effect of plugging in historical BERT models.

[WARNING] This script requires access to a historical word2vec model which available on Zenodo. Please copy the w2v_1760_1900 model to the models folder.

[WARNING] in line 15 of run_main_experiment.py change the path to the word2vec model.

python run_main_experiment.py 

All results should be saved in result_{year} folder.

Time-sensitive approaches

[WARNING] in line 15 of run_experiment_ts_disambiguation.py change the path to the word2vec model.

To create results files for the time-sensitive methods, run:

python python run_experiment_ts_disambiguation.py

Then run run_experiment_ts_disambiguation.py to run the experiments with time-sensitive disambiguation.

python run_experiment_ts_disambiguation.py

Case-studies

[WARNING] in line 15 of run_experiment_curated_cases.py change the path to the word2vec model.

To run the case studies, execute:

python run_experiment_curated_cases.py 

Create Results

To create the results from the output generated by the experiments, run the cells in create_results_tables.ipynb. This notebooks is runnable using the .csv files with results from running the previous scripts.

Explore Results

To explore results and recreate Figure 1, run cells in explore_results.ipynb. This notebooks requires output from generate_dataframes.py (saved in the ./data folder).

Fin.

About

Repository for the work on Targeted Sense Disambiguation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published