New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace pickle with dill #288
Conversation
- Dill is better at pickling, among other things, functions, than pickle
really need this to pickle/unpickle instance methods |
Ping @jaberg |
hyperopt/mongoexp.py
Outdated
worker_fn = json_call(bandit_name, | ||
args=bandit_args, | ||
kwargs=bandit_kwargs).evaluate | ||
elif cmd_protocol == 'domain_attachment': | ||
blob = ctrl.trials.attachments[cmd[1]] | ||
try: | ||
domain = pickle.loads(blob) | ||
domain = dill.loads(blob) | ||
except BaseException as e: | ||
logger.info( | ||
'Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Try installing dill via "pip install dill" is no longer meaningful ;-)
@jaberg Not sure how stable/tested this is, but possibly a candidate for milestone https://github.com/hyperopt/hyperopt/milestone/3 ? |
It looks like the interfaces of pickle and dill are the same. What do you think of a try/except to load dill if it's there, otherwise use pickle? Pickle is standard lib, and some folks might not want to bother with dill. I wonder if this choice is totally encapsulated or if it leaks up to users. I'm thinking for this sort of scientific code that lots of people hack on, it can leak, so requiring dill might be annoying/breaking for some people/applications. |
@jaberg That sounds good to me. There could be a warning if dill is unavailable so those expecting features requiring it will at least have some hint of what's going on. |
try:
import dill as pickler
except Exception as e:
logger.info('Failed to load dill, try installing dill via "pip install dill" for enhanced pickling support.')
import six.moves.cPickle as pickler then just use |
@maxpumperla Exactly what I was thinking :) |
Yeah good call, will update |
Looking at this again, there's a few more things to think through? Can files from dill be loaded with pickle? If yes, then let's simplify, if no, then its more complicated. The purpose of the dump/load is to work with asynchronous searches. Asynchronous Mongo searches have a worker. That worker needs to de-serialize the fmin domain object, so it needs to call the right function. I recall there was some provision for putting info into the mongo database to tell the mongo worker how to deserialize things, but that feels like a bigger change. A feature where we allow users to choose the serialization protocol when calling fmin should have testing, because it touches a few places (mongo, does the ipython stuff work?) and because backward compability is important too. All this sounds like a bigger change than we were trying to do with this release, so I suggest we take it off the milestone. |
@jaberg Isn't the question really whether data created with pickle can be loaded with dill? As long as things are forward compatible, I don't see a real concern. The answer to my question I believe is yes. |
@michaelmior indeed the answer to your question is yes, I think @jaberg is concerned about backwards compatibility if I understood him correctly. |
That's what I understood as well. I'm just not sure why backwards compatibility is a concern. |
I guess I had a number of concerns, and not super-clearly articulated.
How about this one: will the proposed change work if the mongo worker is
not modified? It seems to me the mongo-worker would try to unpickle a byte
string from dill, and fail.
…On Wed, Feb 7, 2018 at 4:17 PM, Michael Mior ***@***.***> wrote:
That's what I understood as well. I'm just not sure why backwards
compatibility is a concern.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#288 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKdDMSh3D4LOm8igJIOr-qg3nbc6ikGks5tShLggaJpZM4LyNwD>
.
|
@jaberg For any values which can be pickled via |
Thanks @michaelmior for thinking about this. For clarity I'm thinking of hyperopt's mongo-worker, that runs this code: This is jogging my memory - there were a bunch of ways of de-serializing the work to do. I don't understand the info message that's already there. Would installing dill enable pickle.loads to de-serialize something serialized with dill? |
If you're asking if old versions of the mongo-worker code will be able to start deserializing things which pickle doesn't support just by installing dill, then the answer is no. But I don't see why this matters. If this scenario occurs, it means the user upgraded the master and serialized something that was not previously serializable. If they did that, it seems reasonable to me that we should just tell them to also upgrade the workers. |
Sorry folks, I mis-read the PR! I was concerned that the mongo worker had not been upgraded to use dill, when in fact it has been. |
Are folks okay with leaving this out of the current milestone? It seems to me that the current milestone is just house-cleaning, getting tests fixed up, getting something on pypi for python2 that works and includes changes from the last year. Relative to that, adding a dependency and changing the serialization protocol seems like it could cause more trouble than it solves. How about let's bring in this change when we make other bigger changes related to no longer supporting python2? |
I don't think it's a big deal to leave this out, but I also think it's a pretty safe change to include and would solve some cryptic error messages that many users may run into. Assuming @jakebian is able to make the changes discussed above, I don't see a great reason not to include it. |
Note that tests were previously failing but this seemed unrelated to these changes so I triggered a rebuild and they now pass. Might still be some flakiness that needs to be sorted out. |
One other thing is that we should probably have a comment somewhere in the readme that mentions the issues you might encounter without dill and how to install it. |
@michaelmior looks like installation is documented in the wiki not the readme, someone should go update that once this gets merged. |
@jaberg Any objections to merging this? |
+1 plz |
+1 |
@jaberg Just another ping about whether you're okay if I go ahead and merge this. I feel confident that it's not going to cause any issues and it will solve problems people are currently having with hyperopt. |
Dill is better at pickling (among other things) functions than pickle
Pickle is bad at pickling functions. In fact in order to pickle a function one is forced to resolve to some hack (e.g. creating a class with a
__call__
property), whereas dill can pickle the function effortlessly.This PR makes it much less tedious to write objective functions to pass into mongoExp.