Skip to content

Serialization of data within a tensor is slow #9168

Closed
@mrocklin

Description

@mrocklin

Issue description

When naively serializing a pytorch tensor (or model) with pickle the process takes longer than it probably should when comparing it to Numpy serialization of the same data. It appears that at some point we're converting things into a list rather than returning the buffers directly.

Code example

In [1]: import numpy, torch, pickle

In [2]: x = numpy.random.random((10000, 10000))

In [3]: t = torch.tensor(x)

In [4]: %time len(pickle.dumps(x)) / 1e6                    # around 1GB/s
CPU times: user 298 ms, sys: 415 ms, total: 713 ms
Wall time: 711 ms
Out[4]: 800.000162

In [5]: %time len(pickle.dumps(t)) / 1e6                    # around 50MB/s
CPU times: user 14.6 s, sys: 1.03 s, total: 15.7 s
Wall time: 15.7 s
Out[5]: 900.200098

The majority of this time is spent in converting the t.storage() object into a list

In [11]: %time _ = t.storage().tolist()
CPU times: user 12.3 s, sys: 891 ms, total: 13.2 s
Wall time: 13.2 s

Instead, we might consider passing around a numpy array, buffer, or memoryview, each of which will serialize much more quickly than converting to many Python objects

In [1]: import torch

In [2]: torch.__version__
Out[2]: '0.4.0'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions