[StaticRuntime] Threading model #46219

hlu1 · 2020-10-12T22:10:30Z

Summary:

Refactor StaticRuntime and group common data structures, the jit graph, and the script module into a separate struct InferenceModule:

struct InferenceModule {
  explicit InferenceModule(const torch::jit::Module& m);
  explicit InferenceModule(std::shared_ptr<torch::jit::Graph> g);
  torch::jit::Module module;
  std::shared_ptr<torch::jit::Graph> graph;
  std::unique_ptr<c10::FunctionSchema> schema;

  std::unordered_map<Value*, size_t> value_to_reg;
  std::vector<size_t> input_regs; // inputs to the graph
  std::vector<size_t> output_regs; // outputs of the graph
  std::vector<size_t> internals;
};

which is stored in the PyTorchPredictor, as well as the static runtime, and shared across threads. Then this is what's left inside the Static Runtime:

  mutable std::vector<IValue> reg_;
  // The nodes we need to run
  std::vector<ProcessedNode> nodes_;

reg_ holds all the weights and activations, which is different across threads during running. nodes_ holds the op nodes and input/output registers, and is the same across threads for now. We could potentially put other stateful data structures in it, so I kept it inside the static runtime. It could be easily moved into the InferenceModule if we decide not to anything else into ProcessedNode.

Added StaticRuntimeOptions so we can toggle certain optimizations on/off, for testing and benchmarking. cleanup_activations is an example.
Integration with PyTorchPredictor. Added a lockfree stack in the PyTorchPredictor to hold all the static runtime instances. Benchmark shows that the push and pop combo takes about 80 ns, which is quite acceptable.

This diff focuses on threading model only. Benchmarks will be separate.

Differential Revision: D24237078

facebook-github-bot · 2020-10-12T22:10:48Z

This pull request was exported from Phabricator. Differential Revision: D24237078

dr-ci · 2020-10-12T22:31:44Z

💊 CI failures summary and remediations

As of commit fb11ae1 (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 1/2 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_clang5_asan_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Oct 20 05:14:23 caused by: Connection refused (os error 111)

Oct 20 05:14:23 ++++ extract_trap_cmd 
Oct 20 05:14:23 ++++ printf '%s\n' '' 
Oct 20 05:14:23 +++ printf '%s\n' cleanup 
Oct 20 05:14:23 ++ trap -- ' 
Oct 20 05:14:23 cleanup' EXIT 
Oct 20 05:14:23 ++ [[ pytorch-linux-xenial-py3-clang5-asan-build != *pytorch-win-* ]] 
Oct 20 05:14:23 ++ which sccache 
Oct 20 05:14:23 ++ sccache --stop-server 
Oct 20 05:14:23 Stopping sccache server... 
Oct 20 05:14:23 error: couldn't connect to server 
Oct 20 05:14:23 caused by: Connection refused (os error 111) 
Oct 20 05:14:23 ++ true 
Oct 20 05:14:23 ++ rm /var/lib/jenkins/sccache_error.log 
Oct 20 05:14:23 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory 
Oct 20 05:14:23 ++ true 
Oct 20 05:14:23 ++ [[ pytorch-linux-xenial-py3-clang5-asan-build == *rocm* ]] 
Oct 20 05:14:23 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
Oct 20 05:14:23 ++ SCCACHE_IDLE_TIMEOUT=1200 
Oct 20 05:14:23 ++ RUST_LOG=sccache::server=error 
Oct 20 05:14:23 ++ sccache --start-server 
Oct 20 05:14:23 Starting sccache server...

Extra GitHub checks: 1 failed

Failed: GitHub Actions - clang-tidy

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 23 times.

codecov · 2020-10-13T00:54:34Z

Codecov Report

Merging #46219 into master will decrease coverage by 0.14%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #46219      +/-   ##
==========================================
- Coverage   68.40%   68.25%   -0.15%     
==========================================
  Files         411      410       -1     
  Lines       54100    53531     -569     
==========================================
- Hits        37008    36539     -469     
+ Misses      17092    16992     -100

Impacted Files	Coverage Δ
...enchmark/utils/valgrind_wrapper/timer_interface.py	`25.78% <0.00%> (-23.32%)`	⬇️
torch/utils/hooks.py	`83.33% <0.00%> (-13.89%)`	⬇️
torch/fx/experimental/Partitioner.py	`81.73% <0.00%> (-9.41%)`	⬇️
torch/utils/benchmark/utils/timer.py	`78.82% <0.00%> (-7.06%)`	⬇️
torch/nn/parameter.py	`89.47% <0.00%> (-4.15%)`	⬇️
torch/utils/benchmark/utils/common.py	`96.40% <0.00%> (-2.24%)`	⬇️
torch/quantization/fx/quantization_patterns.py	`94.27% <0.00%> (-2.23%)`	⬇️
torch/quantization/quantize.py	`91.34% <0.00%> (-1.80%)`	⬇️
torch/utils/checkpoint.py	`80.89% <0.00%> (-1.58%)`	⬇️
torch/quantization/fx/quantize.py	`94.82% <0.00%> (-1.44%)`	⬇️
... and 54 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17f8c32...fb11ae1. Read the comment docs.

facebook-github-bot · 2020-10-13T05:08:44Z

This pull request was exported from Phabricator. Differential Revision: D24237078

Summary: Pull Request resolved: pytorch#46219 - Refactor StaticRuntime and group common data structures, the jit graph, and the script module into a separate struct `InferenceModule`: ``` struct InferenceModule { explicit InferenceModule(const torch::jit::Module& m); explicit InferenceModule(std::shared_ptr<torch::jit::Graph> g); torch::jit::Module module; std::shared_ptr<torch::jit::Graph> graph; std::unique_ptr<c10::FunctionSchema> schema; std::unordered_map<Value*, size_t> value_to_reg; std::vector<size_t> input_regs; // inputs to the graph std::vector<size_t> output_regs; // outputs of the graph std::vector<size_t> internals; }; ``` which is stored in the PyTorchPredictor, as well as the static runtime, and shared across threads. Then this is what's left inside the Static Runtime: ``` mutable std::vector<IValue> reg_; // The nodes we need to run std::vector<ProcessedNode> nodes_; ``` `reg_` holds all the weights and activations, which is different across threads during running. `nodes_` holds the op nodes and input/output registers, and is the same across threads for now. We could potentially put other stateful data structures in it, so I kept it inside the static runtime. It could be easily moved into the `InferenceModule` if we decide not to anything else into `ProcessedNode`. - Added StaticRuntimeOptions so we can toggle certain optimizations on/off, for testing and benchmarking. `cleanup_activations` is an example. - Integration with PyTorchPredictor. Added a lockfree stack in the PyTorchPredictor to hold all the static runtime instances. Benchmark shows that the `push` and `pop` combo takes about 80 ns, which is quite acceptable. This diff focuses on threading model only. Benchmarks will be separate. Reviewed By: bwasti Differential Revision: D24237078 fbshipit-source-id: b0f8572536454247b8738a0c11f6b8b596107cc4

facebook-github-bot · 2020-10-20T05:06:58Z

This pull request was exported from Phabricator. Differential Revision: D24237078

facebook-github-bot · 2020-10-21T02:17:59Z

This pull request has been merged in 1a3ea46.

hlu1 requested a review from apaszke as a code owner October 12, 2020 22:10

facebook-github-bot added the fb-exported label Oct 12, 2020

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Oct 12, 2020

hlu1 force-pushed the export-D24237078 branch from d2a68bc to 351f9ea Compare October 13, 2020 05:08

hlu1 force-pushed the export-D24237078 branch from 351f9ea to fb11ae1 Compare October 20, 2020 05:07

facebook-github-bot closed this in 1a3ea46 Oct 20, 2020

facebook-github-bot added the Merged label Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[StaticRuntime] Threading model #46219

[StaticRuntime] Threading model #46219

Uh oh!

hlu1 commented Oct 12, 2020

Uh oh!

facebook-github-bot commented Oct 12, 2020

Uh oh!

dr-ci bot commented Oct 12, 2020 •

edited

Loading

Uh oh!

codecov bot commented Oct 13, 2020 •

edited

Loading

Uh oh!

facebook-github-bot commented Oct 13, 2020

Uh oh!

facebook-github-bot commented Oct 20, 2020

Uh oh!

facebook-github-bot commented Oct 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[StaticRuntime] Threading model #46219

[StaticRuntime] Threading model #46219

Uh oh!

Conversation

hlu1 commented Oct 12, 2020

Uh oh!

facebook-github-bot commented Oct 12, 2020

Uh oh!

dr-ci bot commented Oct 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_linux_xenial_py3_clang5_asan_build (1/1)

Extra GitHub checks: 1 failed

Uh oh!

codecov bot commented Oct 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

facebook-github-bot commented Oct 13, 2020

Uh oh!

facebook-github-bot commented Oct 20, 2020

Uh oh!

facebook-github-bot commented Oct 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dr-ci bot commented Oct 12, 2020 •

edited

Loading

codecov bot commented Oct 13, 2020 •

edited

Loading