Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chinese breaks multi language detection #143

Closed
LutzSteinborn opened this issue Apr 19, 2023 · 2 comments
Closed

Chinese breaks multi language detection #143

LutzSteinborn opened this issue Apr 19, 2023 · 2 comments
Labels
bug Something isn't working
Milestone

Comments

@LutzSteinborn
Copy link

Hello,
it looks like that chinese in a text breaks multi language detection. I know: its experimental, but it works most of the time pretty good.
Example:
`text="Płaszczowo-rurowe wymienniki ciepła Uszczelkowe der blaue himmel über berlin 中文 the quick brown fox jumps over the lazy dog"

detector=LanguageDetectorBuilder.from_languages(Language.ENGLISH, Language.GERMAN, Language.POLISH).build()
detector.detect_multiple_languages_of(text)
[DetectionResult(start_index=0, end_index=48, word_count=4, language=Language.POLISH), DetectionResult(start_index=48, end_index=77, word_count=5, language=Language.GERMAN)]
`

@pemistahl
Copy link
Owner

Hi Lutz, thank you for your report. I will try to improve the multi language detection algorithm in future releases. Your example might help in this respect.

@pemistahl
Copy link
Owner

pemistahl commented Sep 11, 2023

It turns out that the cause of this issue is the same as for the issue #154 which I've just fixed with commit 67fdebc.

The output of your code after the fix is:

[
  DetectionResult(start_index=0, end_index=48, word_count=4, language=Language.POLISH), 
  DetectionResult(start_index=48, end_index=80, word_count=7, language=Language.GERMAN), 
  DetectionResult(start_index=80, end_index=123, word_count=9, language=Language.ENGLISH)
]

POLISH Płaszczowo-rurowe wymienniki ciepła Uszczelkowe 
GERMAN der blaue himmel über berlin 中文 
ENGLISH the quick brown fox jumps over the lazy dog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants