You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
val mostFrequentAlphabet = detectedAlphabets.entries.maxByOrNull { it.value }!!.key
When text now contains words of multiple alphabets which have the same occurence count, maxByOrNull would pick one of them as most frequent one.
For example the following will return only Greek with 1.0 confidence:
Or maybe in general it would be good to adjust the rule based detection to not make rash decisions. For example when a text is half Japanese and half English (with the English part being the translation), Lingua will most likely return only Japanese with a confidence of 1.0.
Edit: Though maybe this is actually a feature request asking for detection of multiple languages in a text similar to #38, except without having to know where the sections in different languages are, approximate precentages might suffice.
The
mostFrequentAlphabet
detection offilterLanguagesByRules
usesmaxByOrNull
:lingua/src/main/kotlin/com/github/pemistahl/lingua/api/LanguageDetector.kt
Line 318 in 7e415ae
When text now contains words of multiple alphabets which have the same occurence count,
maxByOrNull
would pick one of them as most frequent one.For example the following will return only Greek with 1.0 confidence:
When instead getting a set of alphabets with maximum count the result is (roughly):
{GREEK=1.0, BENGALI=0.6349401470311586}
The text was updated successfully, but these errors were encountered: