Skip to content

Stanza 1.3.0: LangID and Constituency Parser

Compare
Choose a tag to compare
@AngledLuffa AngledLuffa released this 06 Oct 06:28
· 2418 commits to main since this release
f91ca21

Overview

Stanza 1.3.0 introduces a language id model, a constituency parser, a dictionary in the tokenizer, and some additional features and bugfixes.

New features

  • Langid model and multilingual pipeline
    Based on "A reproduction of Apple's bi-directional LSTM models for language identification in short strings." by Toftrup et al 2021
    (154b0e8)

  • Constituency parser
    Based on "In-Order Transition-based Constituent Parsing" by Jiangming Liu and Yue Zhang. Currently an en_wsj model available, with more to come.
    (9031802)

  • Evalb interface to CoreNLP
    Useful for evaluating the parser - requires CoreNLP 4.3.0 or later

  • Dictonary tokenizer feature
    Noticeably improved performance for ZH, VI, TH
    (#776)

Bugfixes / Reliability

  • HuggingFace integration
    No more git issues complaining about unavailable models! (Hopefully)
    (f7af504)

  • Sentiment processor crashes on certain inputs
    (issue #804, fixed by e232f67)