Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you tell me how to identify those 97 languages? #53

Closed
winter-loo opened this issue May 3, 2016 · 2 comments
Closed

Can you tell me how to identify those 97 languages? #53

winter-loo opened this issue May 3, 2016 · 2 comments

Comments

@winter-loo
Copy link

As written in README, langid.py comes pre-trained on 97 languages. How could I reproduce the conclusion? I gave a try for UG language, but it told me it's ZH. I attempted to test JA language, it reported error:

UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 21: illegal multibyte sequence

So, the problem is how to reproduce your conclusion? Thanks in advance.

@winter-loo winter-loo reopened this May 6, 2016
@winter-loo
Copy link
Author

In interactive mode, it works well. However, while reading from file :

python langid.py -n < Chinese.txt

It reports the above error

@zafercavdar
Copy link

Python2 has problems with decoding non-ASCII unicode characters. Instead of using python2, try to run langid with Python3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants