New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization of data within a tensor is slow #9168

Closed
mrocklin opened this Issue Jul 4, 2018 · 17 comments

Comments

Projects
None yet
4 participants
@mrocklin
Contributor

mrocklin commented Jul 4, 2018

Issue description

When naively serializing a pytorch tensor (or model) with pickle the process takes longer than it probably should when comparing it to Numpy serialization of the same data. It appears that at some point we're converting things into a list rather than returning the buffers directly.

Code example

In [1]: import numpy, torch, pickle

In [2]: x = numpy.random.random((10000, 10000))

In [3]: t = torch.tensor(x)

In [4]: %time len(pickle.dumps(x)) / 1e6                    # around 1GB/s
CPU times: user 298 ms, sys: 415 ms, total: 713 ms
Wall time: 711 ms
Out[4]: 800.000162

In [5]: %time len(pickle.dumps(t)) / 1e6                    # around 50MB/s
CPU times: user 14.6 s, sys: 1.03 s, total: 15.7 s
Wall time: 15.7 s
Out[5]: 900.200098

The majority of this time is spent in converting the t.storage() object into a list

In [11]: %time _ = t.storage().tolist()
CPU times: user 12.3 s, sys: 891 ms, total: 13.2 s
Wall time: 13.2 s

Instead, we might consider passing around a numpy array, buffer, or memoryview, each of which will serialize much more quickly than converting to many Python objects

In [1]: import torch

In [2]: torch.__version__
Out[2]: '0.4.0'
@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Jul 4, 2018

Member

the standard pickler will be slow. if you use torch.save to a file or file-like object, it'll be much faster, as it goes through our custom pickling logic.

Have a look here: https://github.com/pytorch/pytorch/blob/master/torch/serialization.py#L212-L286

Member

soumith commented Jul 4, 2018

the standard pickler will be slow. if you use torch.save to a file or file-like object, it'll be much faster, as it goes through our custom pickling logic.

Have a look here: https://github.com/pytorch/pytorch/blob/master/torch/serialization.py#L212-L286

@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 4, 2018

Contributor

Right, I'm suggesting making Torch's implementation of pickle fast.

Contributor

mrocklin commented Jul 4, 2018

Right, I'm suggesting making Torch's implementation of pickle fast.

@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 4, 2018

Contributor

Separately, what is the most efficient way to convert a torch model/tensor into a set of bytes? Pass an io.BytesIO to the save function?

Contributor

mrocklin commented Jul 4, 2018

Separately, what is the most efficient way to convert a torch model/tensor into a set of bytes? Pass an io.BytesIO to the save function?

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Jul 4, 2018

Member

@mrocklin the fastest way is to do t.numpy() and use whatever you'd do to numpy arrays. We dont natively provide conversion to the PyBuffer interface.

Member

soumith commented Jul 4, 2018

@mrocklin the fastest way is to do t.numpy() and use whatever you'd do to numpy arrays. We dont natively provide conversion to the PyBuffer interface.

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Jul 4, 2018

Member

t.numpy() is a free operation, no memcpy, not much going on except to setup some C structs.

Member

soumith commented Jul 4, 2018

t.numpy() is a free operation, no memcpy, not much going on except to setup some C structs.

@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 4, 2018

Contributor

I understand. Would the PyTorch community accept a PR that uses numpy within the __reduce__ methods in order to improve serialization performance of tensor and model objects with naive use of pickle?

Contributor

mrocklin commented Jul 4, 2018

I understand. Would the PyTorch community accept a PR that uses numpy within the __reduce__ methods in order to improve serialization performance of tensor and model objects with naive use of pickle?

@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 4, 2018

Contributor

For others, looks like using io.BytesIO gets up to about 1GB/s.

In [1]: from torchvision.models.resnet import resnet18
   ...: model = resnet18(pretrained=True)
   ...: 
   ...: 

In [2]: import torch

In [3]: import io

In [4]: bio = io.BytesIO()

In [5]: %%time
   ...: torch.save(model, bio)
   ...: b = bio.getvalue()
   ...: 
CPU times: user 32 ms, sys: 16.4 ms, total: 48.4 ms
Wall time: 47.7 ms

In [6]: len(b) / 0.047 / 1e6 # MB/s
Out[6]: 996.619

And then we can reconstitute

io = io.BytesIO(b)
model2 = torch.load(io)
Contributor

mrocklin commented Jul 4, 2018

For others, looks like using io.BytesIO gets up to about 1GB/s.

In [1]: from torchvision.models.resnet import resnet18
   ...: model = resnet18(pretrained=True)
   ...: 
   ...: 

In [2]: import torch

In [3]: import io

In [4]: bio = io.BytesIO()

In [5]: %%time
   ...: torch.save(model, bio)
   ...: b = bio.getvalue()
   ...: 
CPU times: user 32 ms, sys: 16.4 ms, total: 48.4 ms
Wall time: 47.7 ms

In [6]: len(b) / 0.047 / 1e6 # MB/s
Out[6]: 996.619

And then we can reconstitute

io = io.BytesIO(b)
model2 = torch.load(io)
@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 4, 2018

Contributor

For context, I maintain a parallel computing library, Dask, and users are naively passing around PyTorch objects and getting poor performance. I can special-case PyTorch models to use the trick above, but it might be a good idea to make PyTorch's general serialization solution decently fast for other libraries that run into the same problem. I think that this is probably pretty easy to do.

dask/dask-ml#281 (comment)

Contributor

mrocklin commented Jul 4, 2018

For context, I maintain a parallel computing library, Dask, and users are naively passing around PyTorch objects and getting poor performance. I can special-case PyTorch models to use the trick above, but it might be a good idea to make PyTorch's general serialization solution decently fast for other libraries that run into the same problem. I think that this is probably pretty easy to do.

dask/dask-ml#281 (comment)

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Jul 4, 2018

Member

Would the PyTorch community accept a PR that uses numpy within the reduce methods in order to improve serialization performance

I'll discuss with the team and get back to you in a couple of days. We've avoided a dependence on numpy for functionality so far, but it's been a while since we discussed this.

I can special-case PyTorch models to use the trick above

for the moment, this seems like a good idea.

Member

soumith commented Jul 4, 2018

Would the PyTorch community accept a PR that uses numpy within the reduce methods in order to improve serialization performance

I'll discuss with the team and get back to you in a couple of days. We've avoided a dependence on numpy for functionality so far, but it's been a while since we discussed this.

I can special-case PyTorch models to use the trick above

for the moment, this seems like a good idea.

@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 4, 2018

Contributor
Contributor

mrocklin commented Jul 4, 2018

@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 4, 2018

Contributor
Contributor

mrocklin commented Jul 4, 2018

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Jul 4, 2018

Member

MemoryView requires us to implement the Py_buffer interface, which we haven't. Implementing Py_buffer interface across Py2 and Py3 is really complicated (notes from last time we tried doing it).

Member

soumith commented Jul 4, 2018

MemoryView requires us to implement the Py_buffer interface, which we haven't. Implementing Py_buffer interface across Py2 and Py3 is really complicated (notes from last time we tried doing it).

@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 4, 2018

Contributor
Contributor

mrocklin commented Jul 4, 2018

@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 4, 2018

Contributor

Alternatively, PyTorch clearly has code to turn storage objects efficiently into bytestreams. It must do this with torch.save. Is there an internal method somewhere to turn a storage object directly into bytes and then back?

For example, a fully usable solution for pickle would be to call the torch.save code in the comment above and just return those bytes. This isn't quite as clean, but would behave well and doesn't require much work.

Contributor

mrocklin commented Jul 4, 2018

Alternatively, PyTorch clearly has code to turn storage objects efficiently into bytestreams. It must do this with torch.save. Is there an internal method somewhere to turn a storage object directly into bytes and then back?

For example, a fully usable solution for pickle would be to call the torch.save code in the comment above and just return those bytes. This isn't quite as clean, but would behave well and doesn't require much work.

mrocklin added a commit to mrocklin/pytorch that referenced this issue Jul 5, 2018

Use torch.save in _StorageBase.__reduce__
Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle.  For storgae objects of non-trivial size, this was very slow.

Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead.  This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol.

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.

See #9168 for context
@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Jul 5, 2018

Contributor

Short term, here is a possible workaround that reuses torch.save within _StorageBase.__reduce__: #9184

Contributor

mrocklin commented Jul 5, 2018

Short term, here is a possible workaround that reuses torch.save within _StorageBase.__reduce__: #9184

mrocklin added a commit to mrocklin/pytorch that referenced this issue Jul 5, 2018

Use torch.save in _StorageBase.__reduce__
Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle.  For storgae objects of non-trivial size, this was very slow.

Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead.  This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol.

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.

See #9168 for context

facebook-github-bot added a commit that referenced this issue Jul 6, 2018

Use torch.save in _StorageBase.__reduce__ (#9184)
Summary:
Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle.  For storage objects of non-trivial size, this was very slow.

Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead.  This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol or with copy

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.

See #9168 for context
Closes #9184

Differential Revision: D8747794

Pulled By: soumith

fbshipit-source-id: ac598e660c043788ed1ffab3d0303812886edf79

goodlux added a commit to goodlux/pytorch that referenced this issue Jul 6, 2018

Use torch.save in _StorageBase.__reduce__ (#9184)
Summary:
Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle.  For storage objects of non-trivial size, this was very slow.

Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead.  This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol or with copy

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.

See #9168 for context
Closes pytorch#9184

Differential Revision: D8747794

Pulled By: soumith

fbshipit-source-id: ac598e660c043788ed1ffab3d0303812886edf79
@zou3519

This comment has been minimized.

Show comment
Hide comment
@zou3519

zou3519 Jul 9, 2018

Contributor

@mrocklin we'll discuss and get back to you

Contributor

zou3519 commented Jul 9, 2018

@mrocklin we'll discuss and get back to you

@fmassa

This comment has been minimized.

Show comment
Hide comment
@fmassa

fmassa Jul 9, 2018

Member

@zou3519 actually, I think this can be closed since #9184 was merged.

Member

fmassa commented Jul 9, 2018

@zou3519 actually, I think this can be closed since #9184 was merged.

@soumith soumith closed this Jul 9, 2018

goodlux added a commit to goodlux/pytorch that referenced this issue Aug 15, 2018

Use torch.save in _StorageBase.__reduce__ (#9184)
Summary:
Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle.  For storage objects of non-trivial size, this was very slow.

Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead.  This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol or with copy

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.

See #9168 for context
Closes pytorch#9184

Differential Revision: D8747794

Pulled By: soumith

fbshipit-source-id: ac598e660c043788ed1ffab3d0303812886edf79
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment