Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pickle with dill #288

Merged
merged 4 commits into from Jun 17, 2018
Merged

Replace pickle with dill #288

merged 4 commits into from Jun 17, 2018

Conversation

jakebian
Copy link
Contributor

@jakebian jakebian commented Jan 31, 2017

Dill is better at pickling (among other things) functions than pickle

Pickle is bad at pickling functions. In fact in order to pickle a function one is forced to resolve to some hack (e.g. creating a class with a __call__ property), whereas dill can pickle the function effortlessly.

This PR makes it much less tedious to write objective functions to pass into mongoExp.

- Dill is better at pickling, among other things, functions, than pickle
@anaderi
Copy link

anaderi commented May 21, 2017

really need this to pickle/unpickle instance methods

@jakebian
Copy link
Contributor Author

Ping @jaberg

worker_fn = json_call(bandit_name,
args=bandit_args,
kwargs=bandit_kwargs).evaluate
elif cmd_protocol == 'domain_attachment':
blob = ctrl.trials.attachments[cmd[1]]
try:
domain = pickle.loads(blob)
domain = dill.loads(blob)
except BaseException as e:
logger.info(
'Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Try installing dill via "pip install dill" is no longer meaningful ;-)

@j-abi
Copy link

j-abi commented Jan 18, 2018

@jaberg Not sure how stable/tested this is, but possibly a candidate for milestone https://github.com/hyperopt/hyperopt/milestone/3 ?

@jaberg jaberg added this to the 2018 release 1 milestone Jan 20, 2018
@jaberg
Copy link
Contributor

jaberg commented Jan 20, 2018

It looks like the interfaces of pickle and dill are the same.

What do you think of a try/except to load dill if it's there, otherwise use pickle?

Pickle is standard lib, and some folks might not want to bother with dill. I wonder if this choice is totally encapsulated or if it leaks up to users. I'm thinking for this sort of scientific code that lots of people hack on, it can leak, so requiring dill might be annoying/breaking for some people/applications.

@michaelmior
Copy link
Contributor

@jaberg That sounds good to me. There could be a warning if dill is unavailable so those expecting features requiring it will at least have some hint of what's going on.

@maxpumperla
Copy link
Contributor

try:
    import dill as pickler
except Exception as e:
    logger.info('Failed to load dill, try installing dill via "pip install dill" for enhanced pickling support.')
    import six.moves.cPickle as pickler

then just use pickler.loads(foobar). @jakebian would that be a good compromise, given the above discussion?

@michaelmior
Copy link
Contributor

@maxpumperla Exactly what I was thinking :)

@jakebian
Copy link
Contributor Author

jakebian commented Feb 6, 2018

Yeah good call, will update

@jaberg
Copy link
Contributor

jaberg commented Feb 7, 2018

Looking at this again, there's a few more things to think through?

Can files from dill be loaded with pickle? If yes, then let's simplify, if no, then its more complicated.

The purpose of the dump/load is to work with asynchronous searches. Asynchronous Mongo searches have a worker. That worker needs to de-serialize the fmin domain object, so it needs to call the right function. I recall there was some provision for putting info into the mongo database to tell the mongo worker how to deserialize things, but that feels like a bigger change.

A feature where we allow users to choose the serialization protocol when calling fmin should have testing, because it touches a few places (mongo, does the ipython stuff work?) and because backward compability is important too.

All this sounds like a bigger change than we were trying to do with this release, so I suggest we take it off the milestone.

@jaberg jaberg removed this from the 2018 release 1 milestone Feb 7, 2018
@michaelmior
Copy link
Contributor

@jaberg Isn't the question really whether data created with pickle can be loaded with dill? As long as things are forward compatible, I don't see a real concern. The answer to my question I believe is yes.

@jakebian
Copy link
Contributor Author

jakebian commented Feb 7, 2018

@michaelmior indeed the answer to your question is yes, I think @jaberg is concerned about backwards compatibility if I understood him correctly.

@michaelmior
Copy link
Contributor

That's what I understood as well. I'm just not sure why backwards compatibility is a concern.

@jaberg
Copy link
Contributor

jaberg commented Feb 7, 2018 via email

@michaelmior
Copy link
Contributor

@jaberg For any values which can be pickled via pickle, there's no problem. If users want to be able to serialize things that can't be serialized with pickle, they'll of course need dill on the workers as well. However, I think it's acceptable in this case to tell users they need to upgrade the workers as well. I don't think it's worth the extra effort to try to support old workers when the master is updated. If users don't want to update workers, they dont get the new features.

@jaberg
Copy link
Contributor

jaberg commented Feb 7, 2018

Thanks @michaelmior for thinking about this.

For clarity I'm thinking of hyperopt's mongo-worker, that runs this code:
https://github.com/hyperopt/hyperopt/blob/master/hyperopt/mongoexp.py#L1067

This is jogging my memory - there were a bunch of ways of de-serializing the work to do. I don't understand the info message that's already there. Would installing dill enable pickle.loads to de-serialize something serialized with dill?

@michaelmior
Copy link
Contributor

If you're asking if old versions of the mongo-worker code will be able to start deserializing things which pickle doesn't support just by installing dill, then the answer is no. But I don't see why this matters. If this scenario occurs, it means the user upgraded the master and serialized something that was not previously serializable. If they did that, it seems reasonable to me that we should just tell them to also upgrade the workers.

@jaberg
Copy link
Contributor

jaberg commented Feb 8, 2018

Sorry folks, I mis-read the PR! I was concerned that the mongo worker had not been upgraded to use dill, when in fact it has been.

@jaberg
Copy link
Contributor

jaberg commented Feb 8, 2018

Are folks okay with leaving this out of the current milestone? It seems to me that the current milestone is just house-cleaning, getting tests fixed up, getting something on pypi for python2 that works and includes changes from the last year. Relative to that, adding a dependency and changing the serialization protocol seems like it could cause more trouble than it solves.

How about let's bring in this change when we make other bigger changes related to no longer supporting python2?

@michaelmior
Copy link
Contributor

I don't think it's a big deal to leave this out, but I also think it's a pretty safe change to include and would solve some cryptic error messages that many users may run into. Assuming @jakebian is able to make the changes discussed above, I don't see a great reason not to include it.

@michaelmior
Copy link
Contributor

Note that tests were previously failing but this seemed unrelated to these changes so I triggered a rebuild and they now pass. Might still be some flakiness that needs to be sorted out.

@michaelmior
Copy link
Contributor

One other thing is that we should probably have a comment somewhere in the readme that mentions the issues you might encounter without dill and how to install it.

@jakebian
Copy link
Contributor Author

@michaelmior looks like installation is documented in the wiki not the readme, someone should go update that once this gets merged.

@michaelmior
Copy link
Contributor

@jaberg Any objections to merging this?

@ghost
Copy link

ghost commented Mar 22, 2018

+1 plz

@davejm
Copy link

davejm commented May 26, 2018

+1

@michaelmior
Copy link
Contributor

@jaberg Just another ping about whether you're okay if I go ahead and merge this. I feel confident that it's not going to cause any issues and it will solve problems people are currently having with hyperopt.

@jakebian jakebian closed this Jun 15, 2018
@jakebian jakebian reopened this Jun 15, 2018
@maxpumperla maxpumperla merged commit 4d513ba into hyperopt:master Jun 17, 2018
@maxpumperla
Copy link
Contributor

@jakebian PR LGTM, I tested locally. I think all the issues have been addressed. If @jaberg doesn't agree we could roll back. Thanks for this PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants