Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language detection error #14

Closed
erikvullings opened this issue Jun 24, 2022 · 1 comment
Closed

Language detection error #14

erikvullings opened this issue Jun 24, 2022 · 1 comment

Comments

@erikvullings
Copy link

I like your library/tool as it is simple to use, compact, and generally produces good results, but I do have an issue/question. Why does it classify the following block as lt?

tinyld "Russia has been stripped of a multitude of tournaments following an IOC recommendation\n\nRussian Sports Minister Oleg Matytsin has warned the world of sport that the lack of competition from banned Russian athletes is harmful for all concerned, while stating the precise number of events which his country has been stripped of due to the conflict in Ukraine.\n\nRussia has lost major sporting showpieces in recent months following a recommendation from the International Olympic Committee (IOC) at the end of February that federations should neither invite Russian athletes to competitions nor host tournaments in the country.\n\nThat has led to Russia being deprived of events such as the UEFA Champions League final, which was scheduled for St. Petersburg in May, and the world championships in volleyball and ice hockey, planned for 2022 and 2023 respectively.\n\nSports Minister Matytsin has now put an exact figure on the number of events removed from Russia.\n\n“As of May 25, international sports organizations canceled/postponed 186 international sporting events planned in Russia in 2022-2023, including 36 major international sporting events,” said Matytsin, who has been heading a Russian delegation on a visit to India.\n\nThe minister added that Russian sports officials had been tasked with seeking compensation for canceled events – something the likes of the Russian Football Union (RFU) has already said it will do with UEFA and FIFA.\n\nBut as Russian and Belarusian athletes face widespread bans, Matytsin warned that it was not only athletes from the two countries who would suffer.\n\n“This theory [of a damaging absence of competition] applies not only to us, but to all world sports – the lack of competition with Russian athletes is harmful,” Matytsin said, as quoted by RIA Sport.\n\nMatytsin has previously cautioned that world sport cannot hope to develop “normally” without the participation of Russian athletes, arguing that various federations had already come to realize their errors in attempting to alienate Russian sport.\n\nOn the flip side, Matytsin said on Thursday that Russia was also stepping up its efforts to hold tournaments for its athletes and those from other countries.\n\n“From February to May 2022, more than 30 international competitions were held in Russia,” said the minister.\n\nAfter Russian athletes were banned on the eve of the Beijing Winter Paralympics in March, Russia promptly arranged an alternative event at the Siberian resort of Khanty-Mansiysk – something it has vowed to continue to do.\n\nMatytsin has taken the opportunity of his visit to India to discuss the strengthening of sporting ties between the two countries, suggesting that Russia would be more than willing to help India with organizing a future edition of the Olympic Games, should it be granted hosted rights."
[
  { lang: 'lt', accuracy: 0.7009174311926606 },
  { lang: 'en', accuracy: 0.29908256880733947 }
]
@kefniark
Copy link
Contributor

kefniark commented Jul 23, 2022

Sorry for the late answer, some summer vacation in the way 😄

After investigation, it looks like this specific sentence is confused by the family name, which contains unusual sequence of letters for english Matytsin repeated 8 times.
In a more recent build I slightly increase the number of chunk analyzed for long texts, which reduce the risk of this happening.

For this quote, I get the following result with version 1.3.0

[
  { lang: 'en', accuracy: 0.6393910561370124 },
  { lang: 'lt', accuracy: 0.36060894386298764 }
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants