Release v0.5.0 · hopsparser/hopsparser

The performances of the contemporary models in this release are improved, most notably for models
not using BERT.

The scripts/zenodo_upload.py script, a helper for uploading files to a Zenodo deposit.

The CharRNN lexer now represent words with last hidden (instead of cell) state of the LSTM and do
not run on padding anymore.
Minimal Pytorch version is now 1.9.0
Minimal Transformers version is now 4.19.0
Use torch.inference_mode instead of toch.no_grad over all the parser methods.
BERT lexer batches no longer have an obsolete, always zero word_indices attribute
DependencyDataset does not have lexicon attributes (ito(lab|tag and their inverse) since we
don't need these anymore.
The train_model script now skips incomplete runs with a warning.
The train_model script has nicer logging, including progress bars to help keep track of the
experiments.

The first word in the word embeddings lexer vocabulary is not used as padding anymore and has a
real embedding.
BERT embeddings are now correctly computed with an attention mask to ignore padding.
The root token embedding coming from BERT lexers is now an average of non-padding words'
embeddings
FastText embeddings are now computed by averaging over non-padding subwords' embeddings.
In server mode, models are now correctly in eval mode and processing is done
in torch.inference_mode.

Full Changelog: v0.4.2...v0.5.0

Provide feedback