Skip to content

rafehr/PIE-attention

Repository files navigation

PIE-attention

This repo contains the implementation of the work described in An Analysis of Attention in German Verbal Idiom Disambiguation.

To use it, make sure to install the packages listed in requirements.txt. Furthermore, in order to parse the sentences, you will need to download the German transformer pipline for spacy with:

python -m spacy download de_dep_news_trf

After that, you can perform the following steps:

  1. Runpython build.py --corpus_dir <path_to_COLF-VID_1.0_data> to perform a balanced 70/15/15 split of the data set. This will create the folder data with three subfolders train, dev and test with the following files in each folder:
  • sentences.txt
  • pie_idxs.txt (The indices of the PIE components)
  • labels.txt
  • pos_tags.txt
  • sent_ids.txt
  • pie_types.txt (The PIE types an instance belongts to)

Additionaly it will create a file containing the vocabulary (vocab.txt) and the label set (label_set.txt) as well as a file with the numbers of instances per PIE type (num_instances_per_type.json).

  1. The model takes FastText embeddings as input and expects a file with embeddings for all words in the vocab. To create this file, download the binary file for the German model and then run python fetch_fasttext_embs.py --model <path_to_fasttext_binary_file> --vocab_path <path_to_vocab_file>.

  2. Run parse_sents with python parse_sents to create to additional files:

  • heads.txt (The head for evey word in a sentence)
  • deprels.txt (The dependency relations)
  1. To train a model run python train.py --model_name <model_name>. It will get saved in the directory trained_models.
  2. To evaluate a model run python evaluation.py --model_name <model_name>. This will not only evaluate the model, but also collect the attention properties of the individual data points which are saved in the directory stats as attn_stats.json. These can then be used as a basis for statistical analysis.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages