Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: invalid literal for int() with base 10: '!!' #35

Closed
otariidae opened this issue Sep 23, 2020 · 2 comments
Closed

ValueError: invalid literal for int() with base 10: '!!' #35

otariidae opened this issue Sep 23, 2020 · 2 comments

Comments

@otariidae
Copy link

Minimal reproduce code

knp = KNP()
knp.parse("!!")
Traceback (most recent call last):
  File "hoge.py", line 99, in <module>
    knp.parse("!!")
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/pyknp/knp/knp.py", line 70, in parse
    return self.parse_juman_result(juman_str, juman_format)
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/pyknp/knp/knp.py", line 97, in parse_juman_result
    return BList(knp_lines, self.pattern, juman_format)
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/pyknp/knp/blist.py", line 39, in __init__
    self.parse(spec)
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/pyknp/knp/blist.py", line 116, in parse
    synnodes = SynNodes(string)
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/pyknp/knp/syngraph.py", line 21, in __init__
    self.tagids = [int(n) for n in tagid.split(',')]
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/pyknp/knp/syngraph.py", line 21, in <listcomp>
    self.tagids = [int(n) for n in tagid.split(',')]
ValueError: invalid literal for int() with base 10: '!!'

KNP output

$ echo '!!' | jumanpp | knp -tab
# S-ID:1 KNP:5.0-165d699 DATE:2020/09/23 SCORE:-23.04048
* -1D <文頭><文末><体言><用言:判><体言止><レベル:C><区切:5-5><ID:(文末)><裸名詞><提題受:30><主節><状態述語><正規化代表表記:!!/!!><主辞代表表記:!!/!!>
+ -1D <文頭><文末><体言><用言:判><体言止><レベル:C><区切:5-5><ID:(文末)><裸名詞><提題受:30><主節><状態述語><判定詞句><名詞項候補><先行詞候補><正規化代表表記:!!/!!><主辞代表表記:!!/!!><用言代表表記:!!/!!><CF_NOT_FOUND><節-区切><節-主辞><時制:非過去><格解析結果:!!/!!:判0><標準用言代表表記:!!/!!>
!! !! !! 名詞 6 普通名詞 1 * 0 * 0 "品詞推定:名詞 疑似代表表記 代表表記:!!/!! 品詞変更:!!-!!-!!-15-1-0-0" <品詞推定:名詞><疑似代表表記><代表表記:!!/!!><正規化代表表記:!!/!!><品詞変更:!!-!!-!!-15-1-0-0-"品詞推定:名詞 疑似代表表記 代表表記:!!/!!"><品曖-その他><未知語><記英数カ><英記号><記号><名詞相当語><文頭><文末><表現文末><自立><内容語><タグ単位始><文節始><文節主辞><用言表記先頭><用言表記末尾><用言意味表記末尾>
EOS

JUMAN++ 1.02
KNP current HEAD of master ku-nlp/knp@165d699
pyknp current HEAD of master 6ba00ea
Python 3.8.5
OS Ubuntu 20.04

@polm
Copy link

polm commented Oct 9, 2020

I ran into a similar issue using Juman++ (latest release) and pyknp from pip.

Traceback (most recent call last):
  File "./benchmark-jumanpp.py", line 10, in <module>
    for word in tok.analysis(line.strip()).mrph_list():
  File "/mnt/pool/code/tokenizer-benchmark/env/lib/python3.8/site-packages/pyknp/juman/juman.py", line 89, in analysis
    return self.juman(input_str, juman_format)
  File "/mnt/pool/code/tokenizer-benchmark/env/lib/python3.8/site-packages/pyknp/juman/juman.py", line 76, in juman
    result = MList(self.juman_lines(input_str), juman_format)
  File "/mnt/pool/code/tokenizer-benchmark/env/lib/python3.8/site-packages/pyknp/juman/mlist.py", line 29, in __init__
    mrph = Morpheme(line, mid, juman_format)
  File "/mnt/pool/code/tokenizer-benchmark/env/lib/python3.8/site-packages/pyknp/juman/morpheme.py", line 79, in __init__
    self._parse_spec(spec.strip("\n"))
  File "/mnt/pool/code/tokenizer-benchmark/env/lib/python3.8/site-packages/pyknp/juman/morpheme.py", line 142, in _parse_spec
    self.hinsi_id = int(parts[4])
ValueError: invalid literal for int() with base 10: 'input'

@nobu-g
Copy link
Member

nobu-g commented Nov 24, 2020

This problem seems to be fixed now.
I tested in the following environments and confirmed that pyknp works well.

JUMAN++ 1.02 / 2.0.0-rc3
KNP current HEAD of master ku-nlp/knp@2ad4f6d / 4.2
pyknp current HEAD of master 38469c8 / latest version from pip (0.4.5)
Python 3.7.9
OS macOS Bug Sur (11.0.1) / Ubuntu 20.04.1

@nobu-g nobu-g closed this as completed Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants