Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use_dill does not work in Python 3.4 #5998

Closed
Zaharid opened this issue Jun 16, 2014 · 18 comments · Fixed by #6029
Closed

use_dill does not work in Python 3.4 #5998

Zaharid opened this issue Jun 16, 2014 · 18 comments · Fixed by #6029
Milestone

Comments

@Zaharid
Copy link
Contributor

Zaharid commented Jun 16, 2014

When executing a command like:

client[:].use_dill()

Every parallel operation outputs a long obscure pickle error:

def g(a,b):
    return a*b
parallel_client.client[:].apply_sync(g,2,4)
UnpicklingError                           Traceback (most recent call last)/home/zah/sourcecode/ipython/IPython/kernel/zmq/serialize.py in unpack_apply_message(bufs, g, copy)
    178         for i in range(2):
    179             bufs[i] = bufs[i].bytes
--> 180     f = uncan(pickle.loads(bufs.pop(0)), g)
    181     info = pickle.loads(bufs.pop(0))
    182     arg_bufs, kwarg_bufs = bufs[:info['narg_bufs']], bufs[info['narg_bufs']:]
/home/zah/anaconda3/lib/python3.4/site-packages/dill/dill.py in loads(str)
    158     """unpickle an object from a string"""
    159     file = StringIO(str)
--> 160     return load(file)
    161 
    162 # def dumpzs(obj, protocol=None):
/home/zah/anaconda3/lib/python3.4/site-packages/dill/dill.py in load(file)
    148     pik = Unpickler(file)
    149     pik._main_module = _main_module
--> 150     obj = pik.load()
    151     if type(obj).__module__ == _main_module.__name__: # point obj class to main
    152         try: obj.__class__ == getattr(pik._main_module, type(obj).__name__)
/home/zah/anaconda3/lib/python3.4/pickle.py in load(self)
   1034                     raise EOFError
   1035                 assert isinstance(key, bytes_types)
-> 1036                 dispatch[key[0]](self)
   1037         except _Stop as stopinst:
   1038             return stopinst.value
/home/zah/anaconda3/lib/python3.4/pickle.py in load_global(self)
   1319 
   1320     def load_global(self):
-> 1321         module = self.readline()[:-1].decode("utf-8")
   1322         name = self.readline()[:-1].decode("utf-8")
   1323         klass = self.find_class(module, name)
/home/zah/anaconda3/lib/python3.4/pickle.py in readline(self)
    245             if data[-1] != b'\n':
    246                 raise UnpicklingError(
--> 247                     "pickle exhausted before end of frame")
    248             return data
    249         else:
UnpicklingError: pickle exhausted before end of frame

The same works in 3.3.5

@minrk
Copy link
Member

minrk commented Jun 17, 2014

Sounds like a dill bug to me. Ping @mmckerns for any insight on any relevant changes in Python 3.4. What dill version?

@mmckerns
Copy link
Contributor

@minrk: thanks for thinking of me for obscure pickling errors.

There are no changes that I've seen that impact python 3.4 vs 3.3.
However, I can tell by the line numbers above that it's not the bleeding edge version,
but it might be the most current stable version released 6/4, or it might be the version prior to that.

Only recent bugs/fixes were a fix of stdin/stdout handling on 5/28 and one other bug I introduced to the version prior (0.2), released 5/17… that was patched 5/19. Note both of the bugs (5/28) might be relevant… pipe --> StringIO, and the other (5/17) caused a conflict when import dill happened iff pickle was imported first. Both are fixed in the most recent version of dill (0.2.1) released 6/4.

Aside from that, I could test most recent dill and ipython.parallel to see if it works, but I expect that you can also try that.

I'm not seeing an error for that function using my async_map ('Pool.amap') functions with most recent dill and python3.4. I just ran it from a file, not the interpreter. Pickling works different for different ways the thing get run… so any help on how it was run would be good too.

@Zaharid
Copy link
Contributor Author

Zaharid commented Jun 19, 2014

My not working dill version is 0.2.1, as installed from pip three days ago.

@mmckerns
Copy link
Contributor

I'm not having issues with my own map functions and with dill in 3.4. So can you send:

  1. the version of ipython you are using,
  2. and also exactly what you do to generate the error?

It seems it might be a dill working in ipython issue… so since I'm a bit of an ipython novice, can you post the code you ran exactly so I can repeat your code and see if I get the error?

@Zaharid
Copy link
Contributor Author

Zaharid commented Jun 19, 2014

From zero:

conda create -n py34 python=3.4
source activate py34
#Version 2.1
conda install ipython-notebook
conda install pip
#Version 0.2.1
pip install dill
ipcluster start -n4

And then I get the error above.

@mmckerns
Copy link
Contributor

Aha… so ipcluster start -n4 that's what I was missing. I see the error now, after apply_sync is called. I can confirm that it if I type what you did in the first entry, that I get the same error in 3.4, even if I'm using the bleeding edge version of dill. However not when using python 3.3. In both cases I'm using the latest macports ipython

In [1]: from IPython import parallel
In [2]: p = parallel.Client()
In [3]: p[:].use_dill()
Out[3]: <AsyncResult: use_dill>
In [4]: def g(a,c):
   ...:     return a*c
   ...: 
In [5]: p[:].apply_sync(g,2,4)
[0:apply]: 
---------------------------------------------------------------------------
UnpicklingError

I'll look into this.

@mmckerns
Copy link
Contributor

So, this is a new function in pickle in python3.4.

def readline(self):
    if self.current_frame:
        data = self.current_frame.readline()
        if not data:
            self.current_frame = None
            return self.file_readline()
        if data[-1] != b'\n':
            raise UnpicklingError(
                "pickle exhausted before end of frame")
        return data
    else:
        return self.file_readline()

Since it's new in 3.4, and I haven't seen it throw an error before -- I don't know what it does yet. But I don't like it.

__init__ in pickle._Unpickler looked like this in 3.3...

self.readline = file.readline

but in 3.4 it's like this…

    self._unframer = _Unframer(self._file_read, self._file_readline)
    self.read = self._unframer.read
    self.readline = self._unframer.readline

where the readline method above is on the _Unframer. and eventually it gets hooked up elsewhere:

self._file_readline = file.readline

Ok, so yuck.

@mmckerns
Copy link
Contributor

@minrk: does IPython deep down do any selection of the pickling protocol?

python3.4 does some weird stuff, and pickle is still unstable… I found that it actually can pickle much much less stuff if you are using HIGHEST_PROTOCOL, which is 4. They added DEFAULT_PROTOCOL, which is new and is set to 3… because 4 is unstable. So, dill uses DEFAULT_PROTOCOL unless you specify you want HIGHEST_PROTOCOL. I don't ever do that, because like I said 4 is unstable.

From dill:

def dump(obj, file, protocol=None, byref=False):
    """pickle an object to a file"""
    if protocol is None: protocol = DEFAULT_PROTOCOL
    pik = Pickler(file, protocol)

Then, magically, a lot of errors went away with pickling and python 3.4

@minrk
Copy link
Member

minrk commented Jun 20, 2014

I think IPython does hardcode HIGHEST_PROTOCOL. We should probably change this to DEFAULT_PROTOCOL, when defined. Thanks.

@mmckerns
Copy link
Contributor

Hey, and I finally learned how to use IPython.parallel. Yay me. :)

@mmckerns
Copy link
Contributor

@minrk: I just looked into this to see if it'd be easy to generate a pull request for you, and I can confirm that's definitely the root cause. Everywhere you have dumps(obj, -1), you should replace with dumps(obj), and anywhere dumps(obj, protocol) is used, make sure protocol=DEFAULT_PROTOCOL (see below). Surprisingly, it's all throughout the code, so I punted. There's hardly ever a reason to use the "-1" since it's the default everywhere save for python 3.4. Gotta love the new python.

try:
    from pickle import DEFAULT_PROTOCOL
except ImportError:
    DEFAULT_PROTOCOL = pickle.HIGHEST_PROTOCOL

You'll probably have to do that too, since DEFAULT_PROTOCOL is "new" in 3.4.
This issue should be tedious, but easy to fix. I'll pick it up it it's lingering too long.

@takluyver
Copy link
Member

The default in Python 2 is protocol 0. Only in Python 3 is the default version 3.

@mmckerns
Copy link
Contributor

@takluyver: Ah… that's riiiiiight. It's dill that always uses HIGHEST_PROTOCOL by default, save for python 3.4. Still, it's the same fix as above… but with also replacing -1 by DEFAULT_PROTOCOL

@minrk
Copy link
Member

minrk commented Jun 21, 2014

Yeah, I think it should be DEFAULT_PROTOCOL if defined, and either 2 or 3 if it isn't.

@minrk
Copy link
Member

minrk commented Jun 21, 2014

or HIGHEST

@mmckerns
Copy link
Contributor

@minrk: dill uses DEFAULT_PROTOCOL if defined, and HIGHEST_PROTOCOL if not defined. It's not the standard pickle choice, but it is the most common choice and a safe one to make -- unless one of the other applications specifically needs/uses a lower protocol.

@minrk minrk added this to the 2.2 milestone Jun 21, 2014
@minrk
Copy link
Member

minrk commented Jun 21, 2014

Makes sense. That should be what I've done in #6029.

@demmfb
Copy link

demmfb commented Apr 5, 2017

Hey guys, I had been using dill for several months in python 2.7.6 and we recently switched to 3.5.2. dill.dump is working just fine but dill.load gives an error similar to what was discussed here. Nothing has changed in the data or the code and this is basically what I am doing:

dataset_rand = [train_set,valid_set,test_set]
with open('dill_test.pkl', 'wb') as f:
dill.dump(dataset_rand, f)
...everything works fine until this point
dataset='dill_test.pkl'
datasets = dill.load(open(dataset))

...the error:
Traceback (most recent call last):
File "dill_test.py", line 96, in
test_load()
File "dill_test.py", line 90, in test_load
datasets = dill.load(open(dataset))
File "/share/apps/Python-3.5.2/lib/python3.5/site-packages/dill/dill.py", line 250, in load
obj = pik.load()
File "/share/apps/Python-3.5.2/lib/python3.5/pickle.py", line 1038, in load
assert isinstance(key, bytes_types)
AssertionError

Any ideas how to solve this? Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants