morphological-dependency-parsing

Contains code for running the dependency parsing experiments, described in our paper Enhancing deep neural networks with morphological information:

@misc{klemen2020enhancing,
      title={Enhancing deep neural networks with morphological information}, 
      author={Matej Klemen and Luka Krsnik and Marko Robnik-Šikonja},
      year={2020},
      eprint={2011.12432},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

This code contains some modifications of the Biaffine Dependency Parser (Dozat and Manning, 2017) from SuPar. The original repository contains additional models for dependency and constituency parsing as well as a richer documentation.

The modifications made here are:

Decoupled features used as additional input: the original implementation allowed only one of {character embeddings | UPOS embeddings | BERT embeddings } to be used at a time in addition to word embeddings.
Option to use Universal Features embeddings as additional input: Each feature gets embedded with a D-dimensional vector, in total creating a vector of length 23 * <ufeats_embedding_size> (Typo is not used).
Minor component tweaks: e.g. BERT is tuned together with the parser (previously frozen), BERT parameters are tuned with a smaller (fixed) learning rate, all BERT layers are used instead of just the last four.

Installation

$ git clone https://github.com/matejklemen/morphological-dependency-parsing && cd parser
$ python setup.py install

Usage

For full list of options please check supar/cmds/{biaffine_dependency.py, cmd.py}. Many of the parameters are self explanatory, so here are just some specifics:

--path is the path where best checkpoint will be saved to or loaded from.
--embed is the path to load the pretrained word embeddings from (if not provided, they will be trained from scratch). In our case, we use word embeddings, extracted from fastText.
--include_char, --include_bert, --include_upos, --include_ufeats, --include_lstm are flags to determine which features are used in addition to word embeddings.
The character embeddings are fixed to size 50, BERT embeddings are of size corresponding to the used model's hidden size, while POS, universal feature and LSTM embedding sizes are tunable with --upos_emb_size (default: 50), --ufeats_emb_size (default: 30) and --lstm_emb_size (default: 128).
--bert determines the used BERT (actually any transformers compatible) model for BERT embeddings.
--patience is the early stopping tolerance used to stop model training after the mean of UAS and LAS does not improve for specified number of rounds.

$ python3 -m supar.cmds.biaffine_dependency train \
    --path="en_model_with_bert_upos/model" \
    --tree \
    --device 0 \
    --build  \
    --batch_size=64 \
    --punct \
    --train="UD_English-EWT/en_ewt-ud-train.conllu" \
    --dev="UD_English-EWT/en_ewt-ud-dev.conllu" \
    --test="UD_English-EWT/en_ewt-ud-test.conllu" \
    --embed="" \
    --n_embed=100 \
    --include_upos \
    --upos_emb_size=50 \
    --include_bert \
    --bert="bert-base-multilingual-uncased"

References

Timothy Dozat and Christopher D. Manning. 2017. Deep Biaffine Attention for Neural Dependency Parsing.

Name		Name	Last commit message	Last commit date
Latest commit History 292 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
supar		supar
tests		tests
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.ini		config.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

docs

docs

scripts

scripts

supar

supar

tests

tests

.flake8

.flake8

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

config.ini

config.ini

setup.py

setup.py

Repository files navigation

morphological-dependency-parsing

Installation

Usage

References

About

Releases

Packages

Languages

License

matejklemen/morphological-dependency-parsing

Folders and files

Latest commit

History

Repository files navigation

morphological-dependency-parsing

Installation

Usage

References

About

Resources

License

Stars

Watchers

Forks

Languages