Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum byte size of input string #22

Open
aneeshp1994 opened this issue Nov 5, 2019 · 0 comments
Open

Maximum byte size of input string #22

aneeshp1994 opened this issue Nov 5, 2019 · 0 comments

Comments

@aneeshp1994
Copy link

What is the maximum byte size for input string for Morpheme class? I am getting the following error:

Traceback (most recent call last):
  File "generate_vectors.py", line 207, in <module>
    tokenize_text(JA_WIKI_TEXT_FILENAME, JA_WIKI_TEXT_TOKENS_FILENAME)
  File "generate_vectors.py", line 139, in tokenize_text
    tokenized_text = ' '.join(get_words(text, juman_pp=True))
  File "generate_vectors.py", line 114, in get_words
    result = jumanpp.analysis(text)
  File "/media/nicoindia/Ubuntu/miniconda3/envs/gfoot/lib/python3.7/site-packages/pyknp/juman/juman.py", line 91, in analysis
    return self.juman(input_str, juman_format)
  File "/media/nicoindia/Ubuntu/miniconda3/envs/gfoot/lib/python3.7/site-packages/pyknp/juman/juman.py", line 78, in juman
    result = MList(self.juman_lines(input_str), juman_format)
  File "/media/nicoindia/Ubuntu/miniconda3/envs/gfoot/lib/python3.7/site-packages/pyknp/juman/mlist.py", line 29, in __init__
    mrph = Morpheme(line, mid, juman_format)
  File "/media/nicoindia/Ubuntu/miniconda3/envs/gfoot/lib/python3.7/site-packages/pyknp/juman/morpheme.py", line 80, in __init__
    self._parse_spec(spec.strip("\n"))
  File "/media/nicoindia/Ubuntu/miniconda3/envs/gfoot/lib/python3.7/site-packages/pyknp/juman/morpheme.py", line 143, in _parse_spec
    self.hinsi_id = int(parts[4])
ValueError: invalid literal for int() with base 10: 'input'

I have found out that this error is caused because the input string length is greater than maximum length allowed. In morpheme.py, in _parse_spec, if I use print(spec) then I get the following string

'InvalidParameter byte size of input string (12797) is greater than maximum allowed (4096)'

Is there a way to change the maximum length allowed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant