Skip to content

Lingua 1.5.0

Compare
Choose a tag to compare
@pemistahl pemistahl released this 13 Jun 17:47
· 115 commits to main since this release

Features

  • The new method LanguageDetector.detect_multiple_languages_of() has been introduced. It allows to detect multiple languages in mixed-language text. (#1)

  • The new method LanguageDetectorBuilder.with_low_accuracy_mode() has been introduced. By activating it, detection accuracy for short text is reduced in favor of a smaller memory footprint and faster detection performance. (#119)

  • The new method LanguageDetector.compute_language_confidence() has been introduced. It allows to retrieve the confidence value for one specific language only, given the input text. (#102)

Improvements

  • The computation of the confidence values has been revised and the softmax function is now applied to the values, making them better comparable by behaving more like real probabilities. (#120)

  • The WASM API has been revised. Now it makes use of the same builder pattern as the Rust API. (#122)

  • The language model files are now compressed with the Brotli algorithm which reduces the file size by 15 %, on average. (#189)

  • The language model ngrams are now stored in a CompactString type which reduces the amount of consumed memory by 20 %. (#198)

  • Several performance optimizations have been applied which makes the library nearly twice as fast as the previous version. Big thanks go out to @serega and @koute for their help. (#82, #148, #177)

  • The enums IsoCode639_1 and IsoCode639_3 now implement some new traits such as Copy, Hash and Serde's Serialize and Deserialize. The enum Language now implements Copy as well. (#175)