Skip to content

Code for a paper about analysing auxiliary verb constructions in parsing

Notifications You must be signed in to change notification settings

mdelhoneux/avc_analyser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AVC analyser

Code for reproducing the experiments in the following paper:
What Should/Do/Can LSTMs Learn When Parsing Auxiliary Verb Constructions? by Miryam de Lhoneux, Sara Stymne and Joakim Nivre
available on arXiv

Training the parser

The uuparser folder contains a modified version of UUParser where we save intermediate representations in different parts of the network. See that repository for more details on how to train models.

Required software

AVC transformation

To transform the representation of AVCs, see, this repository.

AVC analysis

First, the config.py file needs to be modified to specify the path of trained models as well as datasets.
For running everything from data collection to the diagnostic classifiers, there is one main script: avc_analyser.py Some options are relevant to all steps:

  • --include: list of iso codes
  • --word_types: list of word types from [main_verb, finite_verb, aux, punct]
  • --vec_types: list of vector types from [contextual, type, char, word2vec, composed]
    Those lists can be specified as a string with spaces or in a file with one word per line.

The different steps can be run with the different options in turn, as follows:

  • --create_avc_gold: create the data
  • --train_word2vec: train a word2vec model on the training sets
  • --dump_vecs: dump vectors from all models involved
  • --predict: do the classification
  • --evaluate: evaluate the results

It will create a lot of different folders and files and the final results table will be written to 'res.csv'.

To see all options:

python avc_analyser.py --help

By default, the scripts expects a UD representation for all parts of the pipeline, but the option --style ms can be used to use a MS representation style.

License

This software is released under the terms of the Apache License, Version 2.0.

Contact

For questions and usage issues, please contact miryam dot de underscore lhoneux at lingfil dot uu dot se

About

Code for a paper about analysing auxiliary verb constructions in parsing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published