Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection accuracy #1

Closed
vamseekm opened this issue Nov 17, 2011 · 3 comments
Closed

Detection accuracy #1

vamseekm opened this issue Nov 17, 2011 · 3 comments
Assignees

Comments

@vamseekm
Copy link

中国功夫是一门博大精深的武学艺术 , 中国功夫app , 介绍中国功夫的分类、特点、器材、门派等与中国功夫有关的内容!让广大读者能够更完整的了解中国功夫的精华!

If run the above snippet through your detection tool I am getting "en" as the answer. This is due to a three letter word in the snippet ("app"). Is it possible to fix this issue ?

@vamseekm
Copy link
Author

Sorry my mistake, it was encoding issue. Thanks for the great tool.

@ghost ghost assigned saffsd Nov 18, 2011
@saffsd
Copy link
Owner

saffsd commented Nov 18, 2011

No worries. Indeed, my own check verifies that zh is correctly detected

中国功夫是一门博大精深的武学艺术 , 中国功夫app , 介绍中国功夫的分类、特点、器材、门派等与中国功夫有关的内容!让广大读者能够更完整的了解中国功夫的精华!
('zh', -1414.5709274662972)

Could you provide me some detail on how you used the tool? It would be good if I can detect potential encoding issues beforehand and try to address them, to make the tool as simple as possible for the end user.

@vamseekm
Copy link
Author

I forgot to supply utf8 encoding option to mysql connection while trying to get some unicode text from db. So I was essentially passing garbled mess to langid.py and asking it to identify the language. Again langid.py is a great tool it was my stupid mistake. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants