PIE-attention

This repo contains the implementation of the work described in An Analysis of Attention in German Verbal Idiom Disambiguation.

To use it, make sure to install the packages listed in requirements.txt. Furthermore, in order to parse the sentences, you will need to download the German transformer pipline for spacy with:

python -m spacy download de_dep_news_trf

After that, you can perform the following steps:

Runpython build.py --corpus_dir <path_to_COLF-VID_1.0_data> to perform a balanced 70/15/15 split of the data set. This will create the folder data with three subfolders train, dev and test with the following files in each folder:

sentences.txt
pie_idxs.txt (The indices of the PIE components)
labels.txt
pos_tags.txt
sent_ids.txt
pie_types.txt (The PIE types an instance belongts to)

Additionaly it will create a file containing the vocabulary (vocab.txt) and the label set (label_set.txt) as well as a file with the numbers of instances per PIE type (num_instances_per_type.json).

The model takes FastText embeddings as input and expects a file with embeddings for all words in the vocab. To create this file, download the binary file for the German model and then run python fetch_fasttext_embs.py --model <path_to_fasttext_binary_file> --vocab_path <path_to_vocab_file>.
Run parse_sents with python parse_sents to create to additional files:

heads.txt (The head for evey word in a sentence)
deprels.txt (The dependency relations)

To train a model run python train.py --model_name <model_name>. It will get saved in the directory trained_models.
To evaluate a model run python evaluation.py --model_name <model_name>. This will not only evaluate the model, but also collect the attention properties of the individual data points which are saved in the directory stats as attn_stats.json. These can then be used as a basis for statistical analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
trained_models		trained_models
.gitignore		.gitignore
README.md		README.md
build_dataset.py		build_dataset.py
data.py		data.py
evaluation.py		evaluation.py
fetch_fasttext_embs.py		fetch_fasttext_embs.py
model.py		model.py
parse_sents.py		parse_sents.py
requirements.txt		requirements.txt
stats.py		stats.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PIE-attention

About

Releases

Packages

Languages

rafehr/PIE-attention

Folders and files

Latest commit

History

Repository files navigation

PIE-attention

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages