Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nlp.to_disk() throwing TypeError: __init__() got an unexpected keyword argument 'encoding' #2810

Closed
chingan-tsc opened this issue Sep 28, 2018 · 4 comments
Labels
feat / serialize Feature: Serialization, saving and loading third-party Third-party packages and services

Comments

@chingan-tsc
Copy link

How to reproduce the behaviour

I was following the example here to train my own NER model https://github.com/explosion/spaCy/blob/master/examples/training/train_ner.py and I got the following error with the stack trace:

Traceback (most recent call last):
  File "app.py", line 121, in <module>
    nlp.to_disk(output_dir)
  File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/spacy/language.py", line 621, in to_disk
    util.to_disk(path, serializers, {p: False for p in disable})
  File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/spacy/util.py", line 503, in to_disk
    writer(path / key)
  File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/spacy/language.py", line 609, in <lambda>
    ('tokenizer', lambda p: self.tokenizer.to_disk(p, vocab=False)),
  File "tokenizer.pyx", line 354, in spacy.tokenizer.Tokenizer.to_disk
  File "tokenizer.pyx", line 355, in spacy.tokenizer.Tokenizer.to_disk
  File "tokenizer.pyx", line 384, in spacy.tokenizer.Tokenizer.to_bytes
  File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/spacy/util.py", line 486, in to_bytes
    return msgpack.dumps(serialized, use_bin_type=True, encoding='utf8')
  File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/msgpack_numpy.py", line 196, in packb
    return Packer(**kwargs).pack(o)
TypeError: __init__() got an unexpected keyword argument 'encoding'

Your Environment

  • Operating System: Amazon Linux 2, Mac OS X 10.13.2
  • Python Version Used: 3.6
  • spaCy Version Used: 2.0.12
  • Environment Information:
    My pip list returns
certifi (2018.8.24)
chardet (3.0.4)
cymem (1.31.2)
cytoolz (0.9.0.1)
dill (0.2.8.2)
idna (2.7)
msgpack (0.5.6)
msgpack-numpy (0.4.4.1)
murmurhash (0.28.0)
numpy (1.15.2)
pip (9.0.3)
plac (0.9.6)
preshed (1.0.1)
regex (2017.4.5)
requests (2.19.1)
setuptools (39.0.1)
six (1.11.0)
spacy (2.0.12)
thinc (6.10.3)
toolz (0.9.0)
tqdm (4.26.0)
ujson (1.35)
urllib3 (1.23)
wrapt (1.10.11)

Any ideas why my to_disk() throwing this error?

@Bachstelze
Copy link

Bachstelze commented Sep 28, 2018

I get the same error but in my case it is the spacy.displacy.render() function:
Traceback (most recent call last):

File "serve_trees.py", line 27, in
spacy.displacy.render(doc, style='dep', jupyter=False)
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/spacy/displacy/init.py", line 39, in render
parsed = [converter(doc, options) for doc in docs] if not manual else docs
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/spacy/displacy/init.py", line 39, in
parsed = [converter(doc, options) for doc in docs] if not manual else docs
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/spacy/displacy/init.py", line 89, in parse_deps
doc = Doc(orig_doc.vocab).from_bytes(orig_doc.to_bytes())
File "doc.pyx", line 804, in spacy.tokens.doc.Doc.to_bytes
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/spacy/util.py", line 486, in to_bytes
return msgpack.dumps(serialized, use_bin_type=True, encoding='utf8')
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/msgpack_numpy.py", line 196, in packb
return Packer(**kwargs).pack(o)
TypeError: init() got an unexpected keyword argument 'encoding'

It seems that the error comes from util.py with the last three commits about this encoding: https://github.com/explosion/spaCy/commits/master/spacy/util.py
The last commit adds again the problematic encoding: 6430b1f

But there is no explanation for the deletion and restoring?!

@ines ines added third-party Third-party packages and services feat / serialize Feature: Serialization, saving and loading labels Sep 28, 2018
@honnibal
Copy link
Member

honnibal commented Sep 28, 2018

Current workaround: pip install "msgpack-numpy<0.4.4.0"

The issue is that msgpack-numpy 0.4.4.1 has been released with a backwards-incompatible change: that argument was deprecated, and now throws an error.

The best solution until Thinc updates with a new version is to pin to a previous version of msgpack-numpy, which I think needs this argument for Python 2.7

@ines
Copy link
Member

ines commented Dec 8, 2018

This should be fixed in the latest release of spaCy / Thinc!

For the upcoming version v2.1.0 (currently on develop and available as spacy-nightly), we've packaged our own library of serialization utilities called srlsy, which bundles forks of msgpack and ujson, lets us implement fixes and improvements, ensures spaCy won't break due to third-party updates and lets us ship wheels for the entire thing 🎉 See here for details: https://github.com/explosion/srsly

@ines ines closed this as completed Dec 8, 2018
@lock
Copy link

lock bot commented Jan 7, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feat / serialize Feature: Serialization, saving and loading third-party Third-party packages and services
Projects
None yet
Development

No branches or pull requests

4 participants