-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.save also saves docstrings into pickle for some reason #21745
Comments
This makes our pickles big and exacerbates #21743 |
@ezyang Hello, I started working on this and on #21743. To clarify, for this issue, the docstring should not be saved at all? And for #21743 since you commented on that one as well, just need to handle the error and point the user to the docs? Although if we fix this, the other error should go away in the future, just not for older pickled files. Thanks! |
Thank you for the clarification. I will start with #21743 as it is easier to begin with. For this issue what about explictly removing docstrings or adding a parameter to Docstring should be stored to dunder doc so a simple: def save(something, save_docstring=True):
if save_docstring:
pickle.dumps(something)
else:
backup_docstring = something.__doc__
something.__doc__ = None
pickle.dumps(something)
something.__doc__ = backup_docstring should be enough. It's a bit "hacky" but we can try it like that? What do you think? |
It won't work. Pickle operates recursively and the code you wrote above would only strip |
Wow you're right, didn't think about that, sorry. Another idea: per pickle doc if EDIT: Might not be a good idea after all. After looking at this example: def __getstate__(self):
# Copy the object's state from self.__dict__ which contains
# all our instance attributes. Always use the dict.copy()
# method to avoid modifying the original state.
state = self.__dict__.copy()
# Remove the unpicklable entries.
del state['file']
return state It is suggested to copy the original state which can be huge in our case. :( |
Maybe we can just |
My thoughts exactly, however we have some issues with that also:
Seems more and more like a "won't fix" to me :( |
(1) is an issue, but
We were already doing a copy, no? So it doesn't seem like a big deal to try adding it there. |
Not sure about it, the documentation says:
Doesn't mention a copy, maybe it just reads directly from I skimmed through the source and I don't see an explicit copy operation. :/ |
Oh sorry, I misunderstood, and I thought you were quoting code in our codebase. OK, so this may not be easy to fix. |
Our serialization weirdly saves the entire module code (including the docstring) and verifies that it matches the loaded instance of the pytorch/torch/serialization.py Line 294 in df338f8
I suppose that makes it easier to reproduce results exactly since the code being subtly different may affect the results, but it's not the normal pickle behavior. Also FYI you can use import pickletools
f = open('moo.pt', 'rb')
pickletools.dis(f)
pickletools.dis(f)
pickletools.dis(f)
pickletools.dis(f)
pickletools.dis(f) Which looks like:
|
@driazati So, we could fix this problem by changing that line, right? I wonder how old this code is lol. I guess there are BC concerns too. |
Yeah we could get rid of it, the code is 3 years old. Due to some other issues we were thinking of doing #26567, which could remove this functionality, and we could keep the old load in place as-is to preserve full BC. |
Steps to reproduce:
moo.pt
in your editorExpected result: a bunch of unintelligible binary gobbeldygook
Actual result: I see a docstring!!!
The text was updated successfully, but these errors were encountered: