Skip to content
This repository

cPickle.PicklingError: Can't pickle <type 'ellipsis'>: attribute lookup __builtin__.ellipsis failed #30

Closed
Dieterbe opened this Issue May 10, 2011 · 10 comments

2 participants

Dieter Plaetinck Radim Řehůřek
Dieter Plaetinck

When I have an instance of SparseMatrixSimilarity, and I try to save() it, I get this:

INFO:gensim.utils:saving Similarity object to 846c57f-dirty--CHNK100-EB0-FW1-FW_NA0.5-FW_NB5-M0-NFdata__bom-nerfile-withmediaobjectfragmentids-NUMBEST10-PATRN_INCL-S_L0-S_P0-SQ_K5-SQ_R1-TFIDF0_sim_dense_disk
Traceback (most recent call last):
  File "./build-models.py", line 250, in <module>
    rebuild_data_files(r, args.tag)
  File "./build-models.py", line 118, in rebuild_data_files
    sim.save(sim_filename(tag))
  File "/usr/lib/python2.7/site-packages/gensim/utils.py", line 118, in save
    pickle(self, fname)
  File "/usr/lib/python2.7/site-packages/gensim/utils.py", line 414, in pickle
    cPickle.dump(obj, fout, protocol=protocol)
cPickle.PicklingError: Can't pickle <type 'ellipsis'>: attribute lookup __builtin__.ellipsis failed

Interestingly, I have done this hundreds of times before without issues.
I wonder if it has anything to do with an update to python or numpy, but I don't think so (I did upgrade scipy and python2 a week ago, but reverting didn't fix it)
tested with:

  • python-scipy 0.8.0-4
  • python-scipy 0.9.0-1
  • python2-numpy 1.5.1-2
  • python2 2.7.1-7
  • python2 2.7.1-9
Dieter Plaetinck

I'm currently working around this issue by using http://jsonpickle.github.com/

diff --git a/src/gensim/utils.py b/src/gensim/utils.py
index 817f3b7..3d797a9 100644
--- a/src/gensim/utils.py
+++ b/src/gensim/utils.py
@@ -1,4 +1,4 @@
 #!/usr/bin/env python
 # -*- coding: utf-8 -*-
 #
 # Copyright (C) 2010 Radim Rehurek <radimrehurek@seznam.cz>
@@ -13,6 +13,7 @@ from __future__ import with_statement
 import logging
 import re
 import unicodedata
+import jsonpickle
 import cPickle
 import itertools
 from functools import wraps # for `synchronous` function lock
@@ -421,16 +422,24 @@ def chunkize(corpus, chunks, maxsize=0):
         for chunk in chunkize_serial(corpus, chunks):
             yield chunk


 def pickle(obj, fname, protocol=-1):
     """Pickle object `obj` to file `fname`."""
-    with open(fname, 'wb') as fout: # 'b' for binary, needed on Windows
-        cPickle.dump(obj, fout, protocol=protocol)
+    with open(fname, 'w') as fout:
+       fout.write(jsonpickle.encode(obj))


 def unpickle(fname):
     """Load pickled object from `fname`"""
-    return cPickle.load(open(fname, 'rb'))
+    with open(fname, 'r') as fin:
+       return jsonpickle.decode(fin.read())

Radim Řehůřek
Owner

Can jsonpickle handle very large objects (reasonably memory efficient during save/load)? Dedan had another issue with cPickle, see #31 , so perhaps completely switching from pickle to json would solve both at the same time...

Dieter Plaetinck

Radim, your question triggered this little experiment:
http://dieter.plaetinck.be/poor_mans_pickle_implementations_benchmark.html
I shall check out your numpy-based approach, it is probably better than my jsonpickle approach.

Radim Řehůřek
Owner

Nice! I like benchmarks :)

How about the standard json package? (simplejson in python <2.6)

Dieter Plaetinck

What do you mean? what about it?
the jsonpickle page says "The standard Python libraries for encoding Python into JSON, such as the stdlib’s json, simplejson, and demjson, can only handle Python primitives that have a direct JSON equivalent (e.g. dicts, lists, strings, ints, etc.). jsonpickle builds on top of these libraries"

http://jsonpickle.github.com/

Radim Řehůřek
Owner

Oh, I didn't know it builds on json. In that case its performance is prolly nearly identical, no need to test.

Btw I remembered reading about json speed on metaoptimize some time ago, I managed to googled it up: http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/

Dieter Plaetinck

Well, the article explicitly discourages using it because it's buggy and unmaintained.

Radim Řehůřek
Owner

?? It's part of the standard python library. You probably mean cjson.

Dieter Plaetinck

Yes, I meant cjson. Anyway I don't feel the need to test more things right now, as the numpy native persistency thing you did is probably best anyway. Or am I missing something?

Radim Řehůřek
Owner

For numpy arrays, I think you're right :) Numpy is also very actively developed/maintained, so there's a good chance potential bugs will be fixed quickly. The core numpy guys are very good engineers.

Radim Řehůřek piskvorky closed this September 18, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.