[JIT][Static Runtime] Memory optimization for output tensors #53867

hlu1 · 2021-03-12T02:17:32Z

One of the unique features of Static Runtime is the MemoryPlanner, which aggregates all the memory allocation of the intermediate tensors into a single malloc and caches all the TensorImpls into Static Runtime. It helps speed up inference by reducing the number of mallocs and the time it takes to create/destry Tensor objects and the associated refcount bumps on the fly. However, MemoryPlanner only manages the intermediate tensors, which exclude model inputs and outputs. If we can extend MemoryPlanner to include the output tensors, we can speed up models with multiple outputs dramatically.

First, we'll need some bookkeeping for the output tensors:

  std::vector<std::pair<size_t, std::vector<c10::StorageImpl*>>> managed_output_storage_;
  size_t managed_output_bytes_{0};
  at::DataPtr output_buffer_; // for outputs only

For implementation, see https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/static/impl.cpp
Similar to the intermediates, for outputs, MemoryPlanner can only manage output tensors of ops with out variants. For ops without out variants, their output tensors will be dynamically created by the op. There is nothing the MemoryPlanner can do.

Do pay attention to aliases. We'll need to exclude model input and input aliases. Aliases of intermediate tensors and output tensors need to handled carefully.

For testing, there are a lot of unit tests in https://github.com/pytorch/pytorch/blob/master/benchmarks/static_runtime/test_static_runtime.cc and https://github.com/pytorch/pytorch/blob/master/test/test_static_runtime.py.

cc @gmagogsfm

The text was updated successfully, but these errors were encountered:

hlu1 added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 12, 2021

hlu1 assigned penguinwu Mar 12, 2021

github-actions bot added this to Need triage in JIT Triage Mar 12, 2021

tugsbayasgalan moved this from Need triage to Pending in JIT Triage Mar 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JIT][Static Runtime] Memory optimization for output tensors #53867

[JIT][Static Runtime] Memory optimization for output tensors #53867

hlu1 commented Mar 12, 2021 •

edited by pytorch-probot bot

[JIT][Static Runtime] Memory optimization for output tensors #53867

[JIT][Static Runtime] Memory optimization for output tensors #53867

Comments

hlu1 commented Mar 12, 2021 • edited by pytorch-probot bot

hlu1 commented Mar 12, 2021 •

edited by pytorch-probot bot