New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Japanese Blank Model Issue #3191

Open
abhinandansrivastava opened this Issue Jan 24, 2019 · 1 comment

Comments

Projects
None yet
2 participants
@abhinandansrivastava
Copy link

abhinandansrivastava commented Jan 24, 2019

I am facing this Japanese NER issue:
I am using japanese blank model with Mecab

i used ner.manual recipe

prodigy ner.manual dataset_test_1 path/to/txt/file --label NORP,GPE,ORG,MONEY,TIME

but after running this i got this error:

Using 5 labels: NORP,GPE,ORG,MONEY,TIME
Added dataset dataset_test_1 to database SQLite.
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/conda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/conda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/conda3/lib/python3.6/site-packages/prodigy/__main__.py", line 259, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 178, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "cython_src/prodigy/core.pyx", line 55, in prodigy.core.Controller.__init__
  File "/usr/local/conda3/lib/python3.6/site-packages/toolz/itertoolz.py", line 368, in first
    return next(iter(seq))
  File "cython_src/prodigy/core.pyx", line 84, in iter_tasks
SystemError: <built-in function delete_Tagger> returned a result with an error set
  • Python Version Used:3.6.4
  • spaCy Version Used:2.0.18
  • prodigy version used :1.5.1
  • Swig == 3.0.12
  • mecab-python3 == 0.996.1
@ines

This comment has been minimized.

Copy link
Member

ines commented Jan 24, 2019

Copying over @honnibal's comment for context:

It looks like there's a problem with pickling the external library that's optional for Japanese tokenization. We hadn't seen this before, but will definitely look into it.

In the meantime, I think the following workaround should work to avoid the problem. I haven't tested it myself as I'm using a machine it's hard to install mecab on currently, so apologies if a detail of this is incorrect.

from spacy.lang.ja import JapaneseTokenizer
import copyreg

def pickle_ja_tokenizer(instance):
    return JapaneseTokenizer, tuple()

copyreg.pickle(JapaneseTokenizer, pickle_ja_tokenizer)

The idea here is to use the copyreg module to instruct Python on how to copy the object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment