Skip to content

Conversation

hlu1
Copy link
Contributor

@hlu1 hlu1 commented Oct 12, 2020

Summary:

  • Refactor StaticRuntime and group common data structures, the jit graph, and the script module into a separate struct InferenceModule:
struct InferenceModule {
  explicit InferenceModule(const torch::jit::Module& m);
  explicit InferenceModule(std::shared_ptr<torch::jit::Graph> g);
  torch::jit::Module module;
  std::shared_ptr<torch::jit::Graph> graph;
  std::unique_ptr<c10::FunctionSchema> schema;

  std::unordered_map<Value*, size_t> value_to_reg;
  std::vector<size_t> input_regs; // inputs to the graph
  std::vector<size_t> output_regs; // outputs of the graph
  std::vector<size_t> internals;
};

which is stored in the PyTorchPredictor, as well as the static runtime, and shared across threads. Then this is what's left inside the Static Runtime:

  mutable std::vector<IValue> reg_;
  // The nodes we need to run
  std::vector<ProcessedNode> nodes_;

reg_ holds all the weights and activations, which is different across threads during running. nodes_ holds the op nodes and input/output registers, and is the same across threads for now. We could potentially put other stateful data structures in it, so I kept it inside the static runtime. It could be easily moved into the InferenceModule if we decide not to anything else into ProcessedNode.

  • Added StaticRuntimeOptions so we can toggle certain optimizations on/off, for testing and benchmarking. cleanup_activations is an example.

  • Integration with PyTorchPredictor. Added a lockfree stack in the PyTorchPredictor to hold all the static runtime instances. Benchmark shows that the push and pop combo takes about 80 ns, which is quite acceptable.

This diff focuses on threading model only. Benchmarks will be separate.

Differential Revision: D24237078

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D24237078

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Oct 12, 2020
@dr-ci
Copy link

dr-ci bot commented Oct 12, 2020

💊 CI failures summary and remediations

As of commit fb11ae1 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 1/2 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Oct 20 05:14:23 caused by: Connection refused (os error 111)
Oct 20 05:14:23 ++++ extract_trap_cmd 
Oct 20 05:14:23 ++++ printf '%s\n' '' 
Oct 20 05:14:23 +++ printf '%s\n' cleanup 
Oct 20 05:14:23 ++ trap -- ' 
Oct 20 05:14:23 cleanup' EXIT 
Oct 20 05:14:23 ++ [[ pytorch-linux-xenial-py3-clang5-asan-build != *pytorch-win-* ]] 
Oct 20 05:14:23 ++ which sccache 
Oct 20 05:14:23 ++ sccache --stop-server 
Oct 20 05:14:23 Stopping sccache server... 
Oct 20 05:14:23 error: couldn't connect to server 
Oct 20 05:14:23 caused by: Connection refused (os error 111) 
Oct 20 05:14:23 ++ true 
Oct 20 05:14:23 ++ rm /var/lib/jenkins/sccache_error.log 
Oct 20 05:14:23 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory 
Oct 20 05:14:23 ++ true 
Oct 20 05:14:23 ++ [[ pytorch-linux-xenial-py3-clang5-asan-build == *rocm* ]] 
Oct 20 05:14:23 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
Oct 20 05:14:23 ++ SCCACHE_IDLE_TIMEOUT=1200 
Oct 20 05:14:23 ++ RUST_LOG=sccache::server=error 
Oct 20 05:14:23 ++ sccache --start-server 
Oct 20 05:14:23 Starting sccache server... 

Extra GitHub checks: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 23 times.

@codecov
Copy link

codecov bot commented Oct 13, 2020

Codecov Report

Merging #46219 into master will decrease coverage by 0.14%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #46219      +/-   ##
==========================================
- Coverage   68.40%   68.25%   -0.15%     
==========================================
  Files         411      410       -1     
  Lines       54100    53531     -569     
==========================================
- Hits        37008    36539     -469     
+ Misses      17092    16992     -100     
Impacted Files Coverage Δ
...enchmark/utils/valgrind_wrapper/timer_interface.py 25.78% <0.00%> (-23.32%) ⬇️
torch/utils/hooks.py 83.33% <0.00%> (-13.89%) ⬇️
torch/fx/experimental/Partitioner.py 81.73% <0.00%> (-9.41%) ⬇️
torch/utils/benchmark/utils/timer.py 78.82% <0.00%> (-7.06%) ⬇️
torch/nn/parameter.py 89.47% <0.00%> (-4.15%) ⬇️
torch/utils/benchmark/utils/common.py 96.40% <0.00%> (-2.24%) ⬇️
torch/quantization/fx/quantization_patterns.py 94.27% <0.00%> (-2.23%) ⬇️
torch/quantization/quantize.py 91.34% <0.00%> (-1.80%) ⬇️
torch/utils/checkpoint.py 80.89% <0.00%> (-1.58%) ⬇️
torch/quantization/fx/quantize.py 94.82% <0.00%> (-1.44%) ⬇️
... and 54 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17f8c32...fb11ae1. Read the comment docs.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D24237078

@hlu1 hlu1 force-pushed the export-D24237078 branch from d2a68bc to 351f9ea Compare October 13, 2020 05:08
Summary:
Pull Request resolved: pytorch#46219

- Refactor StaticRuntime and group common data structures, the jit graph, and the script module into a separate struct `InferenceModule`:
```
struct InferenceModule {
  explicit InferenceModule(const torch::jit::Module& m);
  explicit InferenceModule(std::shared_ptr<torch::jit::Graph> g);
  torch::jit::Module module;
  std::shared_ptr<torch::jit::Graph> graph;
  std::unique_ptr<c10::FunctionSchema> schema;

  std::unordered_map<Value*, size_t> value_to_reg;
  std::vector<size_t> input_regs; // inputs to the graph
  std::vector<size_t> output_regs; // outputs of the graph
  std::vector<size_t> internals;
};
```
which is stored in the PyTorchPredictor, as well as the static runtime, and shared across threads. Then this is what's left inside the Static Runtime:
```
  mutable std::vector<IValue> reg_;
  // The nodes we need to run
  std::vector<ProcessedNode> nodes_;
```
`reg_` holds all the weights and activations, which is different across threads during running. `nodes_` holds the op nodes and input/output registers, and is the same across threads for now. We could potentially put other stateful data structures in it, so I kept it inside the static runtime. It could be easily moved into the `InferenceModule` if we decide not to anything else into `ProcessedNode`.

- Added StaticRuntimeOptions so we can toggle certain optimizations on/off, for testing and benchmarking. `cleanup_activations` is an example.

- Integration with PyTorchPredictor. Added a lockfree stack in the PyTorchPredictor to hold all the static runtime instances. Benchmark shows that the `push` and `pop` combo takes about 80 ns, which is quite acceptable.

This diff focuses on threading model only. Benchmarks will be separate.

Reviewed By: bwasti

Differential Revision: D24237078

fbshipit-source-id: b0f8572536454247b8738a0c11f6b8b596107cc4
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D24237078

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 1a3ea46.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fb-exported Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants