Skip to content

Releases: pemistahl/lingua-py

Lingua 1.3.5

03 Apr 12:33
Compare
Choose a tag to compare

Improvements

  • The language models are now stored in dictionaries instead of NumPy arrays. This change leads to significantly improved runtime performance at the cost of higher memory consumption (up to 3 GB for all models). As the runtime performance was much too slow with the former approach, this change makes sense because adding more memory is quite cheap.

  • The language model files are now compressed with the Brotli algorithm which reduces the file size by 15 %, on average.

  • The characters Щщ are now correctly identified as possible indicators for the Ukrainian language, leading to slightly higher accuracy when identifying Ukrainian texts.

Miscellaneous

  • All dependencies have been updated to their latest versions.

Lingua 2.0.2

12 Dec 20:51
Compare
Choose a tag to compare

Improvements

  • Type stubs for the Python bindings are now available, allowing better static code analysis, better code completion in supported IDEs and easier understanding of the library's API. (#197)

Bug Fixes

  • The method LanguageDetector.detect_multiple_languages_of still returned character indices instead of byte indices when only a single DetectionResult was produced. This has been fixed. (#203, #205)

Please note: Due to project size limits on PyPI, the Python wheels for previous version 2.0.1 had to be deleted. Please use 2.0.2 instead.

Lingua 2.0.1

23 Nov 22:39
Compare
Choose a tag to compare

Bug Fixes

  • The method LanguageDetector.detect_multiple_languages_of returns byte indices. For creating string slices in Python, character indices are needed but were not provided. This resulted in incorrect DetectionResults for Python. This has been fixed now by converting the byte indices to character indices. Big thanks to @boltonn for the bug report. (#192)

Please note: Due to project size limits on PyPI, the Python wheels for previous version 2.0.0 had to be deleted. Please use 2.0.1 instead.

Lingua 2.0.0

15 Nov 10:52
Compare
Choose a tag to compare

Features

  • Python bindings for the Rust implementation of Lingua have now replaced the pure Python implementation in order to benefit from Rust's performance in any Python software.

  • Parallel equivalents for all methods in LanguageDetector have been added to give the user the choice of using the library single-threaded or multi-threaded.

Lingua 1.3.4

07 Nov 22:16
Compare
Choose a tag to compare

Miscellaneous

  • This release resolves some dependency issues so that the latest versions of dependencies NumPy, Pandas and Matplotib can be used with Python >= 3.9 while older versions are used with Python 3.8.

  • All dependencies have been updated to their latest versions.

Lingua 1.3.3

27 Sep 08:18
Compare
Choose a tag to compare

Improvements

  • Processing the language models now performs a little faster by performing binary search on the language model NumPy arrays.

Bug Fixes

  • Several bugs in multiple languages detection have been fixed that caused incomplete results to be returned in several cases. (#143, #154)

  • A significant amount of Kazakh texts were incorrectly classified as Mongolian. This has been fixed. (#160)

Miscellaneous

  • A new section on performance tips has been added to the README.

  • All dependencies have been updated to their latest versions.

Lingua 1.3.2

29 Jan 21:23
Compare
Choose a tag to compare

Improvements

  • After applying some internal optimizations, language detection is now faster, at least between 20% and 30%, approximately. For long input texts, the speed improvement is greater than for short input texts.

Lingua 1.3.1

04 Jan 22:47
Compare
Choose a tag to compare

Bug Fixes

  • For long input texts, an error occurred whiled computing the confidence values due to numerical underflow when converting probabilities. This has been fixed. Thanks to @jordimas for reporting this bug. (#102)

Lingua 1.3.0

30 Dec 00:25
Compare
Choose a tag to compare

Improvements

  • The min-max normalization method for the confidence values has been replaced with applying the softmax function. This gives more realistic probabilities. Big thanks to @Alex-Kopylov for proposing and implementing this change. (#99)

Lingua 1.2.1

27 Dec 20:56
Compare
Choose a tag to compare

Bug Fixes

  • Under certain circumstances, calling the method LanguageDetector.detect_multiple_languages_of() raised an IndexError. This has been fixed. Thanks to @Saninsusanin for reporting this bug. (#98)