-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix function serialization #8572
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test for this change.
keras/utils/generic_utils.py
Outdated
@@ -197,7 +199,7 @@ def func_load(code, defaults=None, closure=None, globs=None): | |||
code, defaults, closure = code | |||
if isinstance(defaults, list): | |||
defaults = tuple(defaults) | |||
code = marshal.loads(code.encode('raw_unicode_escape')) | |||
code = marshal.loads(codecs.decode(six.binary_type(code), "base64")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, use '
as string delimiter, here and above.
@fchollet Added a test and fixed the string delimiters. |
It looks like the tests are failing: https://travis-ci.org/fchollet/keras/builds/306493889 (binary strings are not JSON-serializable). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
For a look at a test that covers functions with closure values, have a look at #8592 |
@rh314 @fchollet: This PR seems to have broken backwards compatibility w.r.t. models serialized before this change. I tried to boil this down to the following simple example:
|
This fixes an error I'd been struggling with for a while. Previously,
func_dump
was trying to decode the marshaling output using theraw_unicode_escapes
codec for portability, then encode it to get back to the original. This is the exact opposite of what you want to be doing: you want to encode the bytes returned by marshal into some portable format, then decode it back into the original bytes.As a result of this confusion, instead of (as I presume was intended) encoding the bytes as Unicode escapes, the code was searching the bytes for anything that looked like a Unicode escape and parsing it as such. This is seriously problematic, as marshaled output can contain invalid Unicode escapes, which was the problem I was having. Specifically, my marshaled output contained a path, which, since I'm on Windows, included
c:\users\
, which is an invalid Unicode escape and thus failed the decoding.Presumably, the reason this mistake was made is that marshal returns bytes, which needed to be converted to a string, and decode is the method used to do that, despite the fact that semantically decoding is the opposite of the operation that needs to be performed. This is due to the fact that, for Python, encoding is the operation of encoding a string into bytes and decoding is the operation of decoding bytes into a string. Thus, since here you want to do the opposite, and encode bytes into a string and decode a string into bytes, the default methods fight back against you.
My fix is fairly simple. I explicitly use
codecs
to force the right type of encoding/decoding to be performed, and for portability, I'm using thebase64
codec. Thus, the code takes the marshaled output, converts it to base 64 for portability, then converts it back into bytes when it needs to be loaded. This does the semantically correct thing, and solves the issue I was experiencing above.