No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
CONTRIBUTING.md first commit Jul 15, 2018
LICENSE first commit Jul 15, 2018
README.md first commit Jul 15, 2018
config.json first commit Jul 15, 2018
layers.py first commit Jul 15, 2018
reader.py first commit Jul 15, 2018
test.py first commit Jul 15, 2018
test_cw.py first commit Jul 15, 2018
test_short_cw.py first commit Jul 15, 2018
train_cw.py first commit Jul 15, 2018

README.md

Software and hardware requirements

  • python 2.7
  • numpy
  • Tensorflow 1.5+
  • For fast training, a Nvidia graphic card or GPU

Credits

This code is based on the paper: https://arxiv.org/abs/1805.08237

Bernd Bohnet, Ryan McDonald, Gonçalo Simões, Daniel Andor, Emily Pitler, Joshua Maynez. Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings. ACL, 2018.

Our tagger ranked 1st for morphological features in the CoNLL-2018 Shared Task and had strong results for many languages on upos tags. The tagger is especially strong for cases where a wider context is required to determine the correct tag as for xpos and morphological features tagging.

Contributions: Bernd Bohnet, Ryan McDonald, Gonçalo Simões, Daniel Andor, Emily Pitler, Joshua Maynez, Terry Koo.

Training a tagger

python train_cw.py --train='en-wsj-std-train-stanford-3.3.0.conll'
--dev='en-wsj-std-dev-stanford-3.3.0.conll'
--embeddings='glove.6B.100d.txt'
--task='xtag'
--config='config.json'

The paths need to be adapted. The 'config.json' file contains the settings for the hyperparamerters. The settings for the number of LSTM layers, cells, etc. are smaller than the sizes used in the paper.

The input and output files are in CoNLL-U format: http://universaldependencies.org/format.html

The tagger supports three tasks: --task='upos' | 'xtag' | 'feats'

Applying a tagger

python test_cw.py --test='en-wsj-std-test-stanford-3.3.0.conll'
--task='xtag'
--output_dir='model_save_dir'
--out='output.conll'