Automatic Sense Disambiguation of Potentially Idiomatic Expressions

This is the source code for a system to automatically disambiguate potentially idiomatic expressions (PIEs, for short) in text. It implements four methods of doing so: a baseline most-frequent-sense method, a baseline canonical form-based method (Fazly et al., 2009), a lexical cohesion graph-based method (Sporleder & Li, 2009), and a variation on that method using literal representations of idioms' figurative senses. It evaluates those methods on a combination of four corpora, the VNC-Tokens corpus, the IDIX corpus, the PIE Corpus, and the SemEval-2013 Task 5b dataset. For a detailed description of the systems, see our LAW-MWE-CxG paper.

Requirements

To run this code, you'll need the following Python setup:

Python 2.7.6
beautifulsoup4 4.5.1
numpy 1.14.0
scipy 0.19.1
spacy 2.0.6 + en_core_web_sm 2.0.0

Different versions might work just as well, but cannot be guaranteed.

You'll also need:

Getting Started

Clone the repository
Create subdirectories called working and ext
Add these symlinks (or edit config.py):
- create a symlink ext/BNC to the Texts directory of your copy of the BNC
- create a symlink ext/glove to the directory containing the GloVe embeddings
- create symlinks ext/VNC, ext/IDIX, ext/PIE_Corpus, and ext/SemEval to the main directory of the respective corpora
Try and run the system with python psd.py -c 0 -m cg -gs 0s. This should run a basic lexical cohesion graph method and evaluate on the development set of the combined corpora.
Get an overview of all options by simply running python psd.py --help

Contact

For any questions about (running) the system, feel free to contact me.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
canonical_form.py		canonical_form.py
cohesion_graph.py		cohesion_graph.py
config.py		config.py
definitions.py		definitions.py
evaluate.py		evaluate.py
most_frequent_sense.py		most_frequent_sense.py
pie.py		pie.py
psd.py		psd.py
read_corpus.py		read_corpus.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

canonical_form.py

canonical_form.py

cohesion_graph.py

cohesion_graph.py

config.py

config.py

definitions.py

definitions.py

evaluate.py

evaluate.py

most_frequent_sense.py

most_frequent_sense.py

pie.py

pie.py

psd.py

psd.py

read_corpus.py

read_corpus.py

Repository files navigation

Automatic Sense Disambiguation of Potentially Idiomatic Expressions

Requirements

Getting Started

Contact

About

Releases

Packages

Languages

License

hslh/pie-disambiguation

Folders and files

Latest commit

History

Repository files navigation

Automatic Sense Disambiguation of Potentially Idiomatic Expressions

Requirements

Getting Started

Contact

About

Resources

License

Stars

Watchers

Forks

Languages