Stanza v1.13.0 Release Notes

Download Improvements

Switch model downloads from raw requests calls to the huggingface_hub library when downloading from Hugging Face. This should be more reliable, take advantage of XET when available, and benefit from the HF local cache. Addresses #1619, a report of intermittent download failures. #1614

New Models

Add a Slovenian NER model trained on the UNER dataset, along with the scripts needed to process UNER data into Stanza's internal NER format. #1615
Update the default word vectors for Erzya (MYV) to use rootroo embeddings, and rebuild all MYV models accordingly. The rootroo vectors show clear improvements on POS (dev UPOS 90.81 vs 90.21 with mokha vectors) with a small gain on depparse as well. #1606

Constituency Parser

Fix a longstanding bug in the constituency parser output layer: the nonlinearity was missing between the last two linear layers. The buggy forward pass made those two layers mathematically fusable, so existing models have been condensed to 2 output layers with no loss in accuracy. #1610
Set the default number of output layers to 2. Experiments showed that 2- and 3-layer configurations perform equivalently (once the nonlinearity bug above is corrected), so models will now train with the smaller default. #1611
At the end of training, automatically condense output layer rows that weight decay has trained toward zero, shrinking the saved model and improving inference speed slightly. #1613
Several efficiency improvements to the parser state representation, improving throughput by roughly 20%. Changes include using type() instead of isinstance() for type checks (with appropriate guards), switching TreeStack from a namedtuple to __slots__, and storing transition scheme information as attributes on transition objects to avoid repeated accessor calls. #1603 #1612
Add a script for visualizing constituency parser model weights: outputs heatmaps of linear layer weights and time-series plots of LSTM gate statistics, useful for analyzing training behavior. #1605
Add support for controlling forget gate initialization and applying a separate weight decay to LSTM biases, based on Jozefowicz et al. Experiments showed a small improvement; this is now the default going forward. New models will be trained using this configuration. Future work: apply these LSTM training changes to other models, especially depparse #1609

Dependency Parser

Add a --gradient_checkpointing flag to the dependency parser training script, allowing fine-tuning of larger transformers under tighter memory constraints. #1592
Add a freeze → warmup → plateau learning rate scheduler (WarmupThenPlateauScheduler) for use in the dependency parser. This gives finer control over the stages of transformer fine-tuning. #1589
Detach transformer embeddings from the computation graph whenever the transformer is not actively being trained (e.g. during the frozen stage when bert_finetune=True). This reduces memory usage and speeds up the stages of training that don't update the transformer. #1590

Training Infrastructure

Update language codes and treebank names throughout the codebase to align with UD 2.18. #1599
Add a --additional_files flag for building combined training datasets, and add the ability to construct a combined dataset for any language that has train/dev/test splits. #1598
Refactor bert_layer_mix into a trainable parameter that is passed directly into the bert_embeddings function, removing the need for each model to process the returned embeddings separately. #1597
Unify word embedding storage across models: models that were storing word vectors directly now use PretrainedWordVocab from the shared Pretrain object instead. This removes redundant storage and makes embedding handling consistent across annotators. #1600

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.13.0 - Use huggingface_hub for downloads, conparser efficiency gains

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Stanza v1.13.0 Release Notes

Download Improvements

New Models

Constituency Parser

Dependency Parser

Training Infrastructure

Uh oh!