-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix some bugs with zipfile serialization #32244
Conversation
2ac6851
to
53924a1
Compare
@@ -4522,7 +4522,9 @@ void *mz_zip_reader_extract_file_to_heap(mz_zip_archive *pZip, const char *pFile | |||
mz_bool mz_zip_reader_extract_to_callback(mz_zip_archive *pZip, mz_uint file_index, mz_file_write_func pCallback, void *pOpaque, mz_uint flags) | |||
{ | |||
int status = TINFL_STATUS_DONE; | |||
#ifndef MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fix that was upstreamed in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to update miniz at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could but I think the update would just be the same as this change (plus whatever has been added since)
Summary: Stacked PRs * #32244 - Make zip serialization the default * **#32241 - Split serialization tests to their own file** This makes them all easier to run as a batch. This PR is just a code move / fixing up imports. There are still some serialization tests in `test_torch.py` as part of `TestDeviceType`. ](https://our.intern.facebook.com/intern/diff/19415826/) Pull Request resolved: #32241 Pulled By: driazati Differential Revision: D19415826 fbshipit-source-id: a3f6cfe1626ff2f9b9631c409bf525bd32e4639b
b393b07
to
6c32931
Compare
Summary: Stacked PRs * pytorch#32244 - Make zip serialization the default * **pytorch#32241 - Split serialization tests to their own file** This makes them all easier to run as a batch. This PR is just a code move / fixing up imports. There are still some serialization tests in `test_torch.py` as part of `TestDeviceType`. ](https://our.intern.facebook.com/intern/diff/19415826/) Pull Request resolved: pytorch#32241 Pulled By: driazati Differential Revision: D19415826 fbshipit-source-id: a3f6cfe1626ff2f9b9631c409bf525bd32e4639b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really all that familiar with this code but nothing stands out to me as awful
@@ -6,8 +6,11 @@ | |||
#include <c10/cuda/CUDAGuard.h> | |||
#endif | |||
|
|||
// save_save is necessary since the old eager format saved storages as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
save_size
@@ -4522,7 +4522,9 @@ void *mz_zip_reader_extract_file_to_heap(mz_zip_archive *pZip, const char *pFile | |||
mz_bool mz_zip_reader_extract_to_callback(mz_zip_archive *pZip, mz_uint file_index, mz_file_write_func pCallback, void *pOpaque, mz_uint flags) | |||
{ | |||
int status = TINFL_STATUS_DONE; | |||
#ifndef MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to update miniz at some point?
The added warning is somewhat worrying, since what is suggested would break the possibility of pickling tensors as other Python objects (for instance for now I can pickle a tuple of tensors with a simple call to |
Not really (feel free to open one and tag whoever), but the thinking was that PyTorch has its own serialization format that uses |
@driazati Oh, right, I didn't remember that |
Summary: Stacked PRs * pytorch#32958 - Make zip serialization the default * **pytorch#32244 - Fix some bugs with zipfile serialization** It includes the following changes: * Split up tests so that we can test both serialization methods * Loading something within a buffer doesn't work anymore, so those tests are only on the old serialization method (it's possible but introduces a big slowdown since it requires a linear scan of the entire zipfile to find the magic number at the end) * Call `readinto` on a buffer if possible instead of `read` + a copy * Disable CRC-32 checks on read (there was some issue where miniz said the CRC was wrong but `zipinfo` and `unzip` said the zip file was fine) ](https://our.intern.facebook.com/intern/diff/19418935/) Pull Request resolved: pytorch#32244 Pulled By: driazati Reviewed By: eellison Differential Revision: D19418935 fbshipit-source-id: df140854f52ecd04236225417d625374fd99f573
Summary: Stacked PRs * pytorch#32244 - Make zip serialization the default * **pytorch#32241 - Split serialization tests to their own file** This makes them all easier to run as a batch. This PR is just a code move / fixing up imports. There are still some serialization tests in `test_torch.py` as part of `TestDeviceType`. ](https://our.intern.facebook.com/intern/diff/19415826/) Pull Request resolved: pytorch#32241 Pulled By: driazati Differential Revision: D19415826 fbshipit-source-id: a3f6cfe1626ff2f9b9631c409bf525bd32e4639b
Summary: Stacked PRs * pytorch#32958 - Make zip serialization the default * **pytorch#32244 - Fix some bugs with zipfile serialization** It includes the following changes: * Split up tests so that we can test both serialization methods * Loading something within a buffer doesn't work anymore, so those tests are only on the old serialization method (it's possible but introduces a big slowdown since it requires a linear scan of the entire zipfile to find the magic number at the end) * Call `readinto` on a buffer if possible instead of `read` + a copy * Disable CRC-32 checks on read (there was some issue where miniz said the CRC was wrong but `zipinfo` and `unzip` said the zip file was fine) ](https://our.intern.facebook.com/intern/diff/19418935/) Pull Request resolved: pytorch#32244 Pulled By: driazati Reviewed By: eellison Differential Revision: D19418935 fbshipit-source-id: df140854f52ecd04236225417d625374fd99f573
Yes, torch has its own serialization function that may work better, but that doesn't justify removing pickle support. There are a lot of reasons to still enable pickle, like
|
Echoing @ppwwyyxx; I frequently pass around torch storage as part of larger datastructures, and insisting on pickle means any generic code that serializes those data structures will now need to import torch, even if they have no use for torch's functionality other than Phrased another way, the fact every other part of the Python ecosystem uses pickle for pickling means you can currently treat Python datastructures as black boxes. This change would break that nice opacity. The pickle protocol is pretty powerful - there's a whole VM down there in fact. What functionality does |
Being able to treat PyTorch objects as a black box is a pretty strong use case, can one of you file a follow up issue with the points made here to remove that warning? |
Stacked PRs
It includes the following changes:
readinto
on a buffer if possible instead ofread
+ a copyzipinfo
andunzip
said the zip file was fine)Differential Revision: D19418935