Automatic Detection of Potentially Idiomatic Expressions

This is the source code for a system to automatically detect potentially idiomatic expressions (PIEs, for short) in text. It has four different methods of doing so: exact string match, fuzzy string match, inflectional string match, and parse-based matching. It relies on a set of digitally available idiom dictionaries to get an inventory of expressions, and extracts all instances of those expressions (with context) from the input corpus.

Requirements

To run this code, you'll need the following Python setup:

Python 2.7.6
beautifulsoup4 4.5.1
requests 2.17.3
nltk 3.2.4
spacy 2.0.6 + en_core_web_sm 2.0.0
lxml 3.3.3

Different versions might work just as well, but cannot be guaranteed.

You might also need:

morph, if you want to run inflectional string matching.
Stanford CoreNLP, if you want to run the parse-based method with the Stanford parser.
the Python library stanfordcorenlp 3.7.0.2
the British National Corpus, if you want to extract PIEs from that.

Getting Started

Clone the repository
Create subdirectories called working and ext
If necessary:
- create a symlink ext/morph to the main directory of the morph tools
- create a symlink ext/stanford to the main directory of your Stanford CorenNLP installation
- create a symlink ext/BNC to the Texts directory of your copy of the BNC
Try and run the system with python detect_pies.py data/input_sample.txt -d wiktionary -t plain -m exact. This should extract a list of idioms from Wiktionary and use the exact string match method to extract PIEs from the input sample file.
Get an overview of all options by simply running python detect_pies.py --help

Contact

For any questions about (running) the system, feel free to contact me.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
combine_extracted_idioms.py		combine_extracted_idioms.py
config.py		config.py
detect_pies.py		detect_pies.py
evaluate_extraction.py		evaluate_extraction.py
oxford.py		oxford.py
pos2morpha.py		pos2morpha.py
process_corpus.py		process_corpus.py
using_english.py		using_english.py
utils.py		utils.py
wiktionary.py		wiktionary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Detection of Potentially Idiomatic Expressions

Requirements

Getting Started

Contact

About

Releases

Packages

Languages

License

hslh/pie-detection

Folders and files

Latest commit

History

Repository files navigation

Automatic Detection of Potentially Idiomatic Expressions

Requirements

Getting Started

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages