A Synset Relation-enhanced Framework with a Try-again Mechanism for Word Sense Disambiguation

This repository is the open source code for SREF, a Knowledge-Enhanced Sense Embedding method. Many modules come from LMMS. We thank the authors for opening their valuable modules.

Quick Evaluation

For a quick evaluation of our systems' (SREF_kb, SREF_sup) results on five WSD datasets (SE2, SE3, SE07, SE13 and SE15) and the combined dataset (ALL), run command.sh on your Linux machine or use a Git bash tool on Windows.

Use example_expand.py to crawl sentences from a translation website. Note that we do not use any of the translation results or other information provided by the website. After this process, run example_filter.py to filter those noisy sentences. NOTE: this process takes a considerably lone time, especially for nouns. One can run four process (synsets for 4 POS) on different machines. Alternatively, you can download the processed files from here (put them in ./).

$ python example_expand.py
$ python example_filter.py

Loading BERT

The project relies on bert-as-service, to retrieve BERT embeddings. This process requires a GPU devise which has at least 5GB memory (Large memory, Faster processing). You should not set -max_seq_len to a large number unless it is necessary, because it slows down the process dramatically (a lot of padding of 0s and thus unnecessary calculations).

Parameter choice:
-pooling_strategy REDUCE_MEAN, for basic sense embedding learning -pooling_strategy NONE, for evaluation

$ bert-serving-start -pooling_strategy REDUCE_MEAN -model_dir data/bert/cased_L-24_H-1024_A-16 -pooling_layer -1 -2 -3 -4 -max_seq_len NONE -max_batch_size 32 -num_worker=1 -device_map 0 -cased_tokenization
$ bert-serving-start -pooling_strategy NONE -model_dir data/bert/cased_L-24_H-1024_A-16 -pooling_layer -1 -2 -3 -4 -max_seq_len NONE -max_batch_size 32 -num_worker=1 -device_map 0 -cased_tokenization

You should see the following message when the server is ready:

I:VENTILATOR:[__i:_ru:163]:all set, ready to serve request!

This process should be left open until we start the evaluation process. If you really want to finish the whole experiment with one session, you can use the following code:

$ nohup bert-serving-start -pooling_strategy REDUCE_MEAN -model_dir data/bert/cased_L-24_H-1024_A-16 -pooling_layer -1 -2 -3 -4 -max_seq_len NONE -max_batch_size 32 -num_worker=1 -device_map 0 -cased_tokenization > nohup.out &

When you start with the evaluation process, use the following code to kill the above server. NOTE: this will kill all processes that are related to 'bert-serving-start'

$ ps -ef  grep bert-serving-start  grep -v grep  awk '{print "kill -9 "$2}'  sh

basic sense embeddings

When the BERT server is ready, you should run emb_glosses.py to get the basic sense embeddings from the sense gloss, augmented sentences, and example sentences (usage). For SREF_kb, run the following code:

Parameter choice:
-emb_strategy aug_gloss+examples, SREF_kb, SREF_sup

$ python emb_glosses.py -emb_strategy aug_gloss+examples

Also, we provided the file: aug_gloss+examples so that you can implement the following codes conveniently (put them in /data/vectors).

Sense embeddings enhancement

run synset_expand.py to enhance the basic sense embeddings.

$ python synset_expand.py

We also provide the file emb_wn for convenient reproduction.

WSD evaluation

Before evalution, you should stop the previous bert-as-server process and starts a new one with the parameter -pooling_strategy set to NONE.
When the basic embeddings and BERT server are ready, run eval_nn.py to evaluate our method. Note that we merge the synset expansion (synset_expand.py) algorithm in this file as a function. You should get the following results for the knowledge-based system

Parameter choice:
-emb_strategy aug_gloss+examples, SREF_kb -emb_strategy aug_gloss+examples+lmms, SREF_sup -sec_wsd False to disable the second wsd/ try-again mechanism

$ python eval_nn.py -sec_wsd True -emb_strategy aug_gloss+examples

For SREF_sup, you need to get the LMMS supervised sense embeddings by train.py. It relies on SemCor to learn sense embeddings as a starting point. By running eval_nn.py, you should get the following results.

$ python eval_nn.py -emb_strategy aug_gloss+examples+lmms

	SE2	SE3	SE07	SE13	SE15	ALL
SREFkb	72.7	71.5	61.5	76.4	79.5	73.5
SREFsup	78.6	76.6	72.1	78	80.5	77.8

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.idea		.idea
data		data
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
Scorer.class		Scorer.class
bert_as_service.py		bert_as_service.py
bert_tokenization.py		bert_tokenization.py
command.sh		command.sh
emb_glosses.py		emb_glosses.py
eval_nn.py		eval_nn.py
example_expand.py		example_expand.py
example_filter.py		example_filter.py
requirements.txt		requirements.txt
synset_expand.py		synset_expand.py
train.py		train.py
vectorspace.py		vectorspace.py

License

lwmlyy/SREF

Folders and files

Latest commit

History

Repository files navigation

A Synset Relation-enhanced Framework with a Try-again Mechanism for Word Sense Disambiguation

Quick Evaluation

Table of Contents

Requirements

Crawl augmented_gloss

Loading BERT

basic sense embeddings

Sense embeddings enhancement

WSD evaluation

About

Resources

License

Stars

Watchers

Forks

Languages