New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use torch.save in _StorageBase.__reduce__ #9184

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@mrocklin
Contributor

mrocklin commented Jul 5, 2018

Previously this used the .toliist method, which converted the
storage object into a list of Python objects, and then sent those to
pickle. For storage objects of non-trivial size, this was very slow.

Now we reuse the logic of the torch.save function to efficiently
turn the Storage object into bytes, and send those instead. This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol or with copy

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
from_buffer method.

See #9168 for context

Use torch.save in _StorageBase.__reduce__
Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle.  For storgae objects of non-trivial size, this was very slow.

Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead.  This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol.

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.

See #9168 for context
@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 5, 2018

Contributor

FWIW, tests pass except for the MyPy check, which seems to be a stalled build

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated
Contributor

mrocklin commented Jul 5, 2018

FWIW, tests pass except for the MyPy check, which seems to be a stalled build

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated
@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 5, 2018

Contributor

The performance difference is nice. This reduces the serialization time by a factor of 38 and the size by a factor of 2

This PR

In [1]: from torchvision.models.resnet import resnet18
   ...: import pickle
   ...: model = resnet18(pretrained=True)
   ...: 
   ...: 

In [2]: %time len(pickle.dumps(model))
CPU times: user 35.7 ms, sys: 16.3 ms, total: 52 ms
Wall time: 51.1 ms
Out[2]: 46838320

In [3]: 46838320 / 0.051 / 1e6  # MB/s
Out[3]: 918.3984313725491

0.4.0

In [1]: from torchvision.models.resnet import resnet18
   ...: import pickle
   ...: model = resnet18(pretrained=True)
   ...: 
   ...: 

In [2]: %time len(pickle.dumps(model))
CPU times: user 1.74 s, sys: 180 ms, total: 1.92 s
Wall time: 1.92 s
Out[2]: 105331304

In [3]: 105331304 / 46838320
Out[3]: 2.248827541209847

In [4]: 1.92 / 0.0511
Out[4]: 37.573385518590996
Contributor

mrocklin commented Jul 5, 2018

The performance difference is nice. This reduces the serialization time by a factor of 38 and the size by a factor of 2

This PR

In [1]: from torchvision.models.resnet import resnet18
   ...: import pickle
   ...: model = resnet18(pretrained=True)
   ...: 
   ...: 

In [2]: %time len(pickle.dumps(model))
CPU times: user 35.7 ms, sys: 16.3 ms, total: 52 ms
Wall time: 51.1 ms
Out[2]: 46838320

In [3]: 46838320 / 0.051 / 1e6  # MB/s
Out[3]: 918.3984313725491

0.4.0

In [1]: from torchvision.models.resnet import resnet18
   ...: import pickle
   ...: model = resnet18(pretrained=True)
   ...: 
   ...: 

In [2]: %time len(pickle.dumps(model))
CPU times: user 1.74 s, sys: 180 ms, total: 1.92 s
Wall time: 1.92 s
Out[2]: 105331304

In [3]: 105331304 / 46838320
Out[3]: 2.248827541209847

In [4]: 1.92 / 0.0511
Out[4]: 37.573385518590996
@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Jul 6, 2018

Member

this is great. I just want some eyes wrt backward-compatibility (if needed at all), so cc: @apaszke

Member

soumith commented Jul 6, 2018

this is great. I just want some eyes wrt backward-compatibility (if needed at all), so cc: @apaszke

@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 6, 2018

Contributor

This definitely isn't backwards compatible with data serialized with pickle from previous versions. This is treating pickle more as a wire protocol than as an archival storage format (which is a general recommendation as well).

Contributor

mrocklin commented Jul 6, 2018

This definitely isn't backwards compatible with data serialized with pickle from previous versions. This is treating pickle more as a wire protocol than as an archival storage format (which is a general recommendation as well).

@apaszke

apaszke approved these changes Jul 6, 2018

@apaszke

This comment has been minimized.

Show comment
Hide comment
@apaszke

apaszke Jul 6, 2018

Member

It changes the format, but it won't break loading of older checkpoints (the constructor will still be able to take in lists so that's fine). Thanks a lot for the PR!

Member

apaszke commented Jul 6, 2018

It changes the format, but it won't break loading of older checkpoints (the constructor will still be able to take in lists so that's fine). Thanks a lot for the PR!

@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 6, 2018

Contributor
Contributor

mrocklin commented Jul 6, 2018

@facebook-github-bot

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

goodlux added a commit to goodlux/pytorch that referenced this pull request Jul 6, 2018

Use torch.save in _StorageBase.__reduce__ (#9184)
Summary:
Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle.  For storage objects of non-trivial size, this was very slow.

Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead.  This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol or with copy

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.

See #9168 for context
Closes pytorch#9184

Differential Revision: D8747794

Pulled By: soumith

fbshipit-source-id: ac598e660c043788ed1ffab3d0303812886edf79

@mrocklin mrocklin deleted the mrocklin:storage-serialization branch Jul 8, 2018

@immerrr

This comment has been minimized.

Show comment
Hide comment
@immerrr

immerrr Jul 25, 2018

Got here from the blog post, great work. Just a bit of pain I'd like to share regarding my experience with backward-compatible pickles:

  • The function that's supposed to read the pickled data should be public: if it gets renamed or moved or changed in a backward incompatible way, there's no way to achieve backward compatibility aside from patching the pickled stream. Hence it has the same requirements as public API as soon as a pickle having that function is written by a released version of the package. So creating it as "hidden" by an underscore prefix can lead to a false sense of freedom when dealing with it.
  • The serialization format should be extensible, sometimes just returning a (FORMAT, VALUE) tuple where the pickle deserializer function can decide how to interpret VALUE based on FORMAT in O(1) could help a lot down the road. Because if that's not the case, a backward compatible deserializer function must do isinstance checks or wait for an exception to be raised when trying one format after another.

immerrr commented Jul 25, 2018

Got here from the blog post, great work. Just a bit of pain I'd like to share regarding my experience with backward-compatible pickles:

  • The function that's supposed to read the pickled data should be public: if it gets renamed or moved or changed in a backward incompatible way, there's no way to achieve backward compatibility aside from patching the pickled stream. Hence it has the same requirements as public API as soon as a pickle having that function is written by a released version of the package. So creating it as "hidden" by an underscore prefix can lead to a false sense of freedom when dealing with it.
  • The serialization format should be extensible, sometimes just returning a (FORMAT, VALUE) tuple where the pickle deserializer function can decide how to interpret VALUE based on FORMAT in O(1) could help a lot down the road. Because if that's not the case, a backward compatible deserializer function must do isinstance checks or wait for an exception to be raised when trying one format after another.
@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 25, 2018

Contributor

One relatively low-impact way to achieve this would be to have pytorch.load optionally accept bytestrings.

def load(x, ...):
    if isinstance(x, bytes):
        return load(io.BytesIO(x))
    # normal code for handling files
    ...
Contributor

mrocklin commented Jul 25, 2018

One relatively low-impact way to achieve this would be to have pytorch.load optionally accept bytestrings.

def load(x, ...):
    if isinstance(x, bytes):
        return load(io.BytesIO(x))
    # normal code for handling files
    ...

@mrocklin mrocklin referenced this pull request Aug 2, 2018

Open

ENH: Hyperband implementation #221

6 of 8 tasks complete

goodlux added a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018

Use torch.save in _StorageBase.__reduce__ (#9184)
Summary:
Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle.  For storage objects of non-trivial size, this was very slow.

Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead.  This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol or with copy

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.

See #9168 for context
Closes pytorch#9184

Differential Revision: D8747794

Pulled By: soumith

fbshipit-source-id: ac598e660c043788ed1ffab3d0303812886edf79
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment