Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 7: invalid start byte #6

Closed
dapsjj opened this issue Oct 29, 2018 · 3 comments

Comments

@dapsjj
Copy link

dapsjj commented Oct 29, 2018

I use this code to test pyknp.

from pyknp import Juman
jumanpp = Juman()  # default is JUMAN++: Juman(jumanpp=True). if you use JUMAN, use Juman(jumanpp=False)
result = jumanpp.analysis("お疲れさまです。今週のトップ報告を行います")
for mrph in result.mrph_list(): # 各形態素にアクセス
    print("見出し:%s, 読み:%s, 原形:%s, 品詞:%s, 品詞細分類:%s, 活用型:%s, 活用形:%s, 意味情報:%s, 代表表記:%s" \
            % (mrph.midasi, mrph.yomi, mrph.genkei, mrph.hinsi, mrph.bunrui, mrph.katuyou1, mrph.katuyou2, mrph.imis, mrph.repname))

The error message is:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 7: invalid start byte

@gackel
Copy link
Contributor

gackel commented Oct 29, 2018

Thanks for your report.
As written in the document, the next two lines are necessary when you use python2.

# coding: utf-8
from __future__ import unicode_literals

I'll modify the document to be clearer.

@dapsjj
Copy link
Author

dapsjj commented Oct 29, 2018

Thanks for your report.
As written in the document, the next two lines are necessary when you use python2.

# coding: utf-8
from __future__ import unicode_literals

I'll modify the document to be clearer.

I only installed pyknp.I didn't install JUMAN++ and KNP.
I think this is not right.Is this version Python 3.6.4 valid?
In your document,you said python 2.7.15, 3.5.6, 3.6.6 is supported.
Can I use JUMAN++ in win10?
I use Python 3.6.4 |Anaconda 4.3.1

I modify the code like this:

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals
from pyknp import Juman
jumanpp = Juman()  # default is JUMAN++: Juman(jumanpp=True). if you use JUMAN, use Juman(jumanpp=False)
result = jumanpp.analysis("お疲れさまです。今週のトップ報告を行います")
for mrph in result.mrph_list(): # 各形態素にアクセス
    print("見出し:%s, 読み:%s, 原形:%s, 品詞:%s, 品詞細分類:%s, 活用型:%s, 活用形:%s, 意味情報:%s, 代表表記:%s" \
            % (mrph.midasi, mrph.yomi, mrph.genkei, mrph.hinsi, mrph.bunrui, mrph.katuyou1, mrph.katuyou2, mrph.imis, mrph.repname))

But the error still exist.

Traceback (most recent call last):
File "E:/test/sub.py", line 5, in
result = jumanpp.analysis("お疲れさまです。今週のトップ報告を行います")
File "E:\Anaconda3\lib\site-packages\pyknp\juman\juman.py", line 87, in analysis
return self.juman(input_str)
File "E:\Anaconda3\lib\site-packages\pyknp\juman\juman.py", line 75, in juman
result = MList(self.juman_lines(input_str))
File "E:\Anaconda3\lib\site-packages\pyknp\juman\juman.py", line 70, in juman_lines
return self.subprocess.query(input_str, pattern=self.pattern)
File "E:\Anaconda3\lib\site-packages\pyknp\juman\process.py", line 71, in query
line = self.stdouterr.readline()[:-1].decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 7: invalid start byte

Where is wrong?

@gackel
Copy link
Contributor

gackel commented Oct 29, 2018

The code is correct, and I can work it in my environment.

As you say, you have to install JUMAN++ (or JUMAN).
Please install it.
FYI, the latest JUMAN++ is here: https://github.com/ku-nlp/jumanpp

Btw, in the latest version in github, we check the existence of JUMAN/KNP.
You can install it as below.

% git clone https://github.com/ku-nlp/pyknp
% cd pyknp
% python setup.py install [--prefix=path]

@dapsjj dapsjj closed this as completed Oct 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants