UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 7: invalid start byte #6

dapsjj · 2018-10-29T01:11:26Z

I use this code to test pyknp.

from pyknp import Juman
jumanpp = Juman()  # default is JUMAN++: Juman(jumanpp=True). if you use JUMAN, use Juman(jumanpp=False)
result = jumanpp.analysis("お疲れさまです。今週のトップ報告を行います")
for mrph in result.mrph_list(): # 各形態素にアクセス
    print("見出し:%s, 読み:%s, 原形:%s, 品詞:%s, 品詞細分類:%s, 活用型:%s, 活用形:%s, 意味情報:%s, 代表表記:%s" \
            % (mrph.midasi, mrph.yomi, mrph.genkei, mrph.hinsi, mrph.bunrui, mrph.katuyou1, mrph.katuyou2, mrph.imis, mrph.repname))

The error message is:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 7: invalid start byte

The text was updated successfully, but these errors were encountered:

gackel · 2018-10-29T03:37:38Z

Thanks for your report.
As written in the document, the next two lines are necessary when you use python2.

# coding: utf-8
from __future__ import unicode_literals

I'll modify the document to be clearer.

dapsjj · 2018-10-29T05:56:53Z

Thanks for your report.
As written in the document, the next two lines are necessary when you use python2.

# coding: utf-8
from __future__ import unicode_literals

I'll modify the document to be clearer.

I only installed pyknp.I didn't install JUMAN++ and KNP.
I think this is not right.Is this version Python 3.6.4 valid?
In your document,you said python 2.7.15, 3.5.6, 3.6.6 is supported.
Can I use JUMAN++ in win10?
I use Python 3.6.4 |Anaconda 4.3.1

I modify the code like this:

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals
from pyknp import Juman
jumanpp = Juman()  # default is JUMAN++: Juman(jumanpp=True). if you use JUMAN, use Juman(jumanpp=False)
result = jumanpp.analysis("お疲れさまです。今週のトップ報告を行います")
for mrph in result.mrph_list(): # 各形態素にアクセス
    print("見出し:%s, 読み:%s, 原形:%s, 品詞:%s, 品詞細分類:%s, 活用型:%s, 活用形:%s, 意味情報:%s, 代表表記:%s" \
            % (mrph.midasi, mrph.yomi, mrph.genkei, mrph.hinsi, mrph.bunrui, mrph.katuyou1, mrph.katuyou2, mrph.imis, mrph.repname))

But the error still exist.

Traceback (most recent call last):
File "E:/test/sub.py", line 5, in
result = jumanpp.analysis("お疲れさまです。今週のトップ報告を行います")
File "E:\Anaconda3\lib\site-packages\pyknp\juman\juman.py", line 87, in analysis
return self.juman(input_str)
File "E:\Anaconda3\lib\site-packages\pyknp\juman\juman.py", line 75, in juman
result = MList(self.juman_lines(input_str))
File "E:\Anaconda3\lib\site-packages\pyknp\juman\juman.py", line 70, in juman_lines
return self.subprocess.query(input_str, pattern=self.pattern)
File "E:\Anaconda3\lib\site-packages\pyknp\juman\process.py", line 71, in query
line = self.stdouterr.readline()[:-1].decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 7: invalid start byte

Where is wrong?

gackel · 2018-10-29T13:45:45Z

The code is correct, and I can work it in my environment.

As you say, you have to install JUMAN++ (or JUMAN).
Please install it.
FYI, the latest JUMAN++ is here: https://github.com/ku-nlp/jumanpp

Btw, in the latest version in github, we check the existence of JUMAN/KNP.
You can install it as below.

% git clone https://github.com/ku-nlp/pyknp
% cd pyknp
% python setup.py install [--prefix=path]

dapsjj closed this as completed Oct 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 7: invalid start byte #6

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 7: invalid start byte #6

dapsjj commented Oct 29, 2018 •

edited

Loading

gackel commented Oct 29, 2018 •

edited

Loading

dapsjj commented Oct 29, 2018 •

edited

Loading

gackel commented Oct 29, 2018

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 7: invalid start byte #6

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 7: invalid start byte #6

Comments

dapsjj commented Oct 29, 2018 • edited Loading

gackel commented Oct 29, 2018 • edited Loading

dapsjj commented Oct 29, 2018 • edited Loading

gackel commented Oct 29, 2018

dapsjj commented Oct 29, 2018 •

edited

Loading

gackel commented Oct 29, 2018 •

edited

Loading

dapsjj commented Oct 29, 2018 •

edited

Loading