Can you tell me how to identify those 97 languages? #53

winter-loo · 2016-05-03T08:42:41Z

As written in README, langid.py comes pre-trained on 97 languages. How could I reproduce the conclusion? I gave a try for UG language, but it told me it's ZH. I attempted to test JA language, it reported error:

UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 21: illegal multibyte sequence

So, the problem is how to reproduce your conclusion? Thanks in advance.

The text was updated successfully, but these errors were encountered:

winter-loo · 2016-05-06T08:05:11Z

In interactive mode, it works well. However, while reading from file :

python langid.py -n < Chinese.txt

It reports the above error

zafercavdar · 2018-08-22T09:11:55Z

Python2 has problems with decoding non-ASCII unicode characters. Instead of using python2, try to run langid with Python3.

winter-loo closed this as completed May 6, 2016

winter-loo reopened this May 6, 2016

winter-loo closed this as completed May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can you tell me how to identify those 97 languages? #53

Can you tell me how to identify those 97 languages? #53

winter-loo commented May 3, 2016

winter-loo commented May 6, 2016

zafercavdar commented Aug 22, 2018

Can you tell me how to identify those 97 languages? #53

Can you tell me how to identify those 97 languages? #53

Comments

winter-loo commented May 3, 2016

winter-loo commented May 6, 2016

zafercavdar commented Aug 22, 2018