Skip to content

Conversation

ljk53
Copy link
Contributor

@ljk53 ljk53 commented Sep 19, 2019

Stack from ghstack:

Summary:

  • At inference time we need turn off autograd mode and turn on no-variable
    mode since we strip out these modules for inference-only mobile build.
  • Both flags are stored in thread-local variables so we cannot simply
    set them to false glboally.
  • Add "autograd/grad_mode.h" header to all-in-one header 'torch/script.h'
    to reduce friction for iOS engs who might need do this manually in their
    project.

P.S. I tried to hide AutoNonVariableTypeMode in codegen but figured it's not
very trivial (e.g. there are manually written part not covered by codegen).
Might try it again later.

Test Plan:

  • Integrate with Android demo app to confirm inference runs correctly.

Differential Revision: D17484259

Summary:
- At inference time we need turn off autograd mode and turn on no-variable
  mode since we strip out these modules for inference-only mobile build.
- Both flags are stored in thread-local variables so we cannot simply
  set them to false glboally.
- Add "autograd/grad_mode.h" header to all-in-one header 'torch/script.h'
  to reduce friction for iOS engs who might need do this manually in their
  project.

P.S. I tried to hide AutoNonVariableTypeMode in codegen but figured it's not
very trivial (e.g. there are manually written part not covered by codegen).
Might try it again later.

Test Plan:
- Integrate with Android demo app to confirm inference runs correctly.

[ghstack-poisoned]
@pytorchbot pytorchbot added oncall: jit Add this issue/PR to JIT oncall triage queue module: android Related to Android support labels Sep 19, 2019
ljk53 added a commit that referenced this pull request Sep 19, 2019
Summary:
- At inference time we need turn off autograd mode and turn on no-variable
  mode since we strip out these modules for inference-only mobile build.
- Both flags are stored in thread-local variables so we cannot simply
  set them to false glboally.
- Add "autograd/grad_mode.h" header to all-in-one header 'torch/script.h'
  to reduce friction for iOS engs who might need do this manually in their
  project.

P.S. I tried to hide AutoNonVariableTypeMode in codegen but figured it's not
very trivial (e.g. there are manually written part not covered by codegen).
Might try it again later.

Test Plan:
- Integrate with Android demo app to confirm inference runs correctly.

ghstack-source-id: 94e1d54
Pull Request resolved: #26477
@facebook-github-bot
Copy link
Contributor

@ljk53 merged this pull request in 956b708.

ljk53 added a commit that referenced this pull request Sep 26, 2019
Summary:
- This PR attempts to address issue #26764 (`Issue 1` mentioned below).

- Current flow without USE_STATIC_DISPATCH (for server build):
```
S1. jit::load()
  a. JIT calls variable_factories.h methods to instantiate variable instances.

  b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale.
    b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher.
    b.2 ATen dispatcher dispatches to corresponding VariableType methods.
    b.3 VariableType method uses `AutoNonVariableTypeMode` guard before calling into ATen implementation, as ATen generally expects `CHECK(!is_variable())`.
    b.4 VariableType method uses `as_variable` to wrap the results.

  x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after S1.a / S1.b.

S2. module::forward()
  a. JIT interpreter calls some ATen methods (via JIT registry).
    a.1 - a.4: same as S1.b.1 - S1.b.4.

  x. Different from S1.x, seems JIT doesn't expect `CHECK(is_variable())` during the entire `forward()` call.
```

- Current flow with USE_STATIC_DISPATCH (for mobile build):
```
M1. jit::load()
  a. JIT calls variable_factories.h methods to instantiate variable instances.

  b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale.
    b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher.
    b.2 ATen dispatcher dispatches to corresponding ATen implementation directly.
      // Issue 1: NO VariableType methods / `AutoNonVariableTypeMode` so `CHECK(!is_variable())` in ATen will fail!
      // (Hypothetical) Issue 2: NO `as_variable()` to wrap result as variable. M1.x will fail if is ever used to check this result.

  x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after M1.a / M1.b.

M2. module::forward() // PR #26477 wraps this call with `AutoNonVariableTypeMode` guard.
  a. JIT interpreter calls some ATen methods (via JIT registry).
    a.1 same as M1.b.1, calls into register_aten_ops_xxx.cpp.
    a.2 same as M1.b.2, calls ATen implementation directly.
      // `CHECK(!is_variable())` in ATen won't fail thanks to the outer scope `AutoNonVariableTypeMode` guard.

  x. Same as above, seems JIT never expects `CHECK(is_variable())` during the entire `forward()` call.
```

- Wrong solution: if we wrap M1 with `AutoNonVariableTypeMode`, it will solve `Issue 1` for some models but will fail M1.x for some other models.

- Proposed solution:
I feel the root cause is that mobile build doesn't have `VariableType` as a barrier sitting between JIT and ATen to convert between is_variable() and !is_variable().
Without `VariableType` the best alternative place to put a barrier is M2.a.1 - `register_aten_ops_xxx.cpp`.
So this PR adds `AutoNonVariableTypeMode` guard to register_aten_ops_xxx.cpp (for ops registered via codegen) and register_c10_ops.cpp (for ops registered via new C10 register API).
This PR doesn't try to address (Hypothetical) Issue 2 as I haven't seen it.
PR #26477 can be replaced by this PR but we can keep it until M2.x is no longer true.

- Ultimate solution:
After Variable and Tensor are completely merged: #23032 then is_variable() checks can be changed to requires_grad() checks and all problems will be solved. We can clean up these hacks by then.

- References:
* Effect of `AutoNonVariableTypeMode`: all `is_variable()` inside current thread scope returns false:
  https://github.com/pytorch/pytorch/blob/master/c10/core/TensorImpl.h#L811

* Effect of `as_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/VariableTypeUtils.h#L159
  It calls `make_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/variable.h#L539

Test Plan:
- Load and run MobileNetV2 fp32 & int8 models.

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Sep 26, 2019
Summary:
- This PR attempts to address issue #26764 (`Issue 1` mentioned below).

- Current flow without USE_STATIC_DISPATCH (for server build):
```
S1. jit::load()
  a. JIT calls variable_factories.h methods to instantiate variable instances.

  b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale.
    b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher.
    b.2 ATen dispatcher dispatches to corresponding VariableType methods.
    b.3 VariableType method uses `AutoNonVariableTypeMode` guard before calling into ATen implementation, as ATen generally expects `CHECK(!is_variable())`.
    b.4 VariableType method uses `as_variable` to wrap the results.

  x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after S1.a / S1.b.

S2. module::forward()
  a. JIT interpreter calls some ATen methods (via JIT registry).
    a.1 - a.4: same as S1.b.1 - S1.b.4.

  x. Different from S1.x, seems JIT doesn't expect `CHECK(is_variable())` during the entire `forward()` call.
```

- Current flow with USE_STATIC_DISPATCH (for mobile build):
```
M1. jit::load()
  a. JIT calls variable_factories.h methods to instantiate variable instances.

  b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale.
    b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher.
    b.2 ATen dispatcher dispatches to corresponding ATen implementation directly.
      // Issue 1: NO VariableType methods / `AutoNonVariableTypeMode` so `CHECK(!is_variable())` in ATen will fail!
      // (Hypothetical) Issue 2: NO `as_variable()` to wrap result as variable. M1.x will fail if is ever used to check this result.

  x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after M1.a / M1.b.

M2. module::forward() // PR #26477 wraps this call with `AutoNonVariableTypeMode` guard.
  a. JIT interpreter calls some ATen methods (via JIT registry).
    a.1 same as M1.b.1, calls into register_aten_ops_xxx.cpp.
    a.2 same as M1.b.2, calls ATen implementation directly.
      // `CHECK(!is_variable())` in ATen won't fail thanks to the outer scope `AutoNonVariableTypeMode` guard.

  x. Same as above, seems JIT never expects `CHECK(is_variable())` during the entire `forward()` call.
```

- Wrong solution: if we wrap M1 with `AutoNonVariableTypeMode`, it will solve `Issue 1` for some models but will fail M1.x for some other models.

- Proposed solution:
I feel the root cause is that mobile build doesn't have `VariableType` as a barrier sitting between JIT and ATen to convert between is_variable() and !is_variable().
Without `VariableType` the best alternative place to put a barrier is M2.a.1 - `register_aten_ops_xxx.cpp`.
So this PR adds `AutoNonVariableTypeMode` guard to register_aten_ops_xxx.cpp (for ops registered via codegen) and register_c10_ops.cpp (for ops registered via new C10 register API).
This PR doesn't try to address (Hypothetical) Issue 2 as I haven't seen it.
PR #26477 can be replaced by this PR but we can keep it until M2.x is no longer true.

- Ultimate solution:
After Variable and Tensor are completely merged: #23032 then is_variable() checks can be changed to requires_grad() checks and all problems will be solved. We can clean up these hacks by then.

- References:
* Effect of `AutoNonVariableTypeMode`: all `is_variable()` inside current thread scope returns false:
  https://github.com/pytorch/pytorch/blob/master/c10/core/TensorImpl.h#L811

* Effect of `as_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/VariableTypeUtils.h#L159
  It calls `make_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/variable.h#L539

Test Plan:
- Load and run MobileNetV2 fp32 & int8 models.

ghstack-source-id: a9e5b2d
Pull Request resolved: #26868
ljk53 added a commit that referenced this pull request Sep 27, 2019
…oundary"

Summary:
- This PR together with #26908 attempt to address issue #26764 (`Issue 1` mentioned below).

- Current flow without USE_STATIC_DISPATCH (for server build):
```
S1. jit::load()
  a. JIT calls variable_factories.h methods to instantiate variable instances.

  b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale.
    b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher.
    b.2 ATen dispatcher dispatches to corresponding VariableType methods.
    b.3 VariableType method uses `AutoNonVariableTypeMode` guard before calling into ATen implementation, as ATen generally expects `CHECK(!is_variable())`.
    b.4 VariableType method uses `as_variable` to wrap the results.

  x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after S1.a / S1.b.

S2. module::forward()
  a. JIT interpreter calls some ATen methods (via JIT registry).
    a.1 - a.4: same as S1.b.1 - S1.b.4.

  x. Different from S1.x, seems JIT doesn't expect `CHECK(is_variable())` during the entire `forward()` call.
```

- Current flow with USE_STATIC_DISPATCH (for mobile build):
```
M1. jit::load()
  a. JIT calls variable_factories.h methods to instantiate variable instances.

  b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale.
    b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher.
    b.2 ATen dispatcher dispatches to corresponding ATen implementation directly.
      // Issue 1: NO VariableType methods / `AutoNonVariableTypeMode` so `CHECK(!is_variable())` in ATen will fail!
      // (Hypothetical) Issue 2: NO `as_variable()` to wrap result as variable. M1.x will fail if is ever used to check this result.

  x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after M1.a / M1.b.

M2. module::forward() // PR #26477 wraps this call with `AutoNonVariableTypeMode` guard.
  a. JIT interpreter calls some ATen methods (via JIT registry).
    a.1 same as M1.b.1, calls into register_aten_ops_xxx.cpp.
    a.2 same as M1.b.2, calls ATen implementation directly.
      // `CHECK(!is_variable())` in ATen won't fail thanks to the outer scope `AutoNonVariableTypeMode` guard.

  x. Same as above, seems JIT never expects `CHECK(is_variable())` during the entire `forward()` call.
```

- Wrong solution: if we wrap M1 with `AutoNonVariableTypeMode`, it will solve `Issue 1` for some models but will fail M1.x for some other models.

- Proposed solution:
I feel the root cause is that mobile build doesn't have `VariableType` as a barrier sitting between JIT and ATen to convert between is_variable() and !is_variable().

Without `VariableType` the best alternative place to put a barrier is M1.b.2 as Edward did in #26908.

For some reason we also need toggle variable state for c10 ops: this is what this PR does. We haven't figured how non-mobile build works without this logic so it's kinda bandaid for now.

This PR doesn't try to address (Hypothetical) Issue 2 as I haven't seen it. PR #26477 can be replaced by #26908 + this PR but we can keep it until M2.x is no longer true.

- Ultimate solution:
After Variable and Tensor are completely merged: #23032 then is_variable() checks can be changed to requires_grad() checks and all problems will be solved. We can clean up these hacks by then.

- References:
* Effect of `AutoNonVariableTypeMode`: all `is_variable()` inside current thread scope returns false:
  https://github.com/pytorch/pytorch/blob/master/c10/core/TensorImpl.h#L811

* Effect of `as_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/VariableTypeUtils.h#L159
  It calls `make_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/variable.h#L539

Test Plan:
- Patch PR #26908 then load and run MobileNetV2 fp32 & int8 models.

Differential Revision: [D17595179](https://our.internmc.facebook.com/intern/diff/D17595179)

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Sep 27, 2019
Summary:
- This PR together with #26908 attempt to address issue #26764 (`Issue 1` mentioned below).

- Current flow without USE_STATIC_DISPATCH (for server build):
```
S1. jit::load()
  a. JIT calls variable_factories.h methods to instantiate variable instances.

  b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale.
    b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher.
    b.2 ATen dispatcher dispatches to corresponding VariableType methods.
    b.3 VariableType method uses `AutoNonVariableTypeMode` guard before calling into ATen implementation, as ATen generally expects `CHECK(!is_variable())`.
    b.4 VariableType method uses `as_variable` to wrap the results.

  x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after S1.a / S1.b.

S2. module::forward()
  a. JIT interpreter calls some ATen methods (via JIT registry).
    a.1 - a.4: same as S1.b.1 - S1.b.4.

  x. Different from S1.x, seems JIT doesn't expect `CHECK(is_variable())` during the entire `forward()` call.
```

- Current flow with USE_STATIC_DISPATCH (for mobile build):
```
M1. jit::load()
  a. JIT calls variable_factories.h methods to instantiate variable instances.

  b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale.
    b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher.
    b.2 ATen dispatcher dispatches to corresponding ATen implementation directly.
      // Issue 1: NO VariableType methods / `AutoNonVariableTypeMode` so `CHECK(!is_variable())` in ATen will fail!
      // (Hypothetical) Issue 2: NO `as_variable()` to wrap result as variable. M1.x will fail if is ever used to check this result.

  x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after M1.a / M1.b.

M2. module::forward() // PR #26477 wraps this call with `AutoNonVariableTypeMode` guard.
  a. JIT interpreter calls some ATen methods (via JIT registry).
    a.1 same as M1.b.1, calls into register_aten_ops_xxx.cpp.
    a.2 same as M1.b.2, calls ATen implementation directly.
      // `CHECK(!is_variable())` in ATen won't fail thanks to the outer scope `AutoNonVariableTypeMode` guard.

  x. Same as above, seems JIT never expects `CHECK(is_variable())` during the entire `forward()` call.
```

- Wrong solution: if we wrap M1 with `AutoNonVariableTypeMode`, it will solve `Issue 1` for some models but will fail M1.x for some other models.

- Proposed solution:
I feel the root cause is that mobile build doesn't have `VariableType` as a barrier sitting between JIT and ATen to convert between is_variable() and !is_variable().

Without `VariableType` the best alternative place to put a barrier is M1.b.2 as Edward did in #26908.

For some reason we also need toggle variable state for c10 ops: this is what this PR does. We haven't figured how non-mobile build works without this logic so it's kinda bandaid for now.

This PR doesn't try to address (Hypothetical) Issue 2 as I haven't seen it. PR #26477 can be replaced by #26908 + this PR but we can keep it until M2.x is no longer true.

- Ultimate solution:
After Variable and Tensor are completely merged: #23032 then is_variable() checks can be changed to requires_grad() checks and all problems will be solved. We can clean up these hacks by then.

- References:
* Effect of `AutoNonVariableTypeMode`: all `is_variable()` inside current thread scope returns false:
  https://github.com/pytorch/pytorch/blob/master/c10/core/TensorImpl.h#L811

* Effect of `as_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/VariableTypeUtils.h#L159
  It calls `make_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/variable.h#L539

Test Plan:
- Patch PR #26908 then load and run MobileNetV2 fp32 & int8 models.

ghstack-source-id: e2e50bb
Pull Request resolved: #26868
facebook-github-bot pushed a commit that referenced this pull request Sep 27, 2019
Summary:
- This PR together with #26908 attempt to address issue #26764 (`Issue 1` mentioned below).

- Current flow without USE_STATIC_DISPATCH (for server build):
```
S1. jit::load()
  a. JIT calls variable_factories.h methods to instantiate variable instances.

  b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale.
    b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher.
    b.2 ATen dispatcher dispatches to corresponding VariableType methods.
    b.3 VariableType method uses `AutoNonVariableTypeMode` guard before calling into ATen implementation, as ATen generally expects `CHECK(!is_variable())`.
    b.4 VariableType method uses `as_variable` to wrap the results.

  x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after S1.a / S1.b.

S2. module::forward()
  a. JIT interpreter calls some ATen methods (via JIT registry).
    a.1 - a.4: same as S1.b.1 - S1.b.4.

  x. Different from S1.x, seems JIT doesn't expect `CHECK(is_variable())` during the entire `forward()` call.
```

- Current flow with USE_STATIC_DISPATCH (for mobile build):
```
M1. jit::load()
  a. JIT calls variable_factories.h methods to instantiate variable instances.

  b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale.
    b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher.
    b.2 ATen dispatcher dispatches to corresponding ATen implementation directly.
      // Issue 1: NO VariableType methods / `AutoNonVariableTypeMode` so `CHECK(!is_variable())` in ATen will fail!
      // (Hypothetical) Issue 2: NO `as_variable()` to wrap result as variable. M1.x will fail if is ever used to check this result.

  x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after M1.a / M1.b.

M2. module::forward() // PR #26477 wraps this call with `AutoNonVariableTypeMode` guard.
  a. JIT interpreter calls some ATen methods (via JIT registry).
    a.1 same as M1.b.1, calls into register_aten_ops_xxx.cpp.
    a.2 same as M1.b.2, calls ATen implementation directly.
      // `CHECK(!is_variable())` in ATen won't fail thanks to the outer scope `AutoNonVariableTypeMode` guard.

  x. Same as above, seems JIT never expects `CHECK(is_variable())` during the entire `forward()` call.
```

- Wrong solution: if we wrap M1 with `AutoNonVariableTypeMode`, it will solve `Issue 1` for some models but will fail M1.x for some other models.

- Proposed solution:
I feel the root cause is that mobile build doesn't have `VariableType` as a barrier sitting between JIT and ATen to convert between is_variable() and !is_variable().

Without `VariableType` the best alternative place to put a barrier is M1.b.2 as Edward did in #26908.

For some reason we also need toggle variable state for c10 ops: this is what this PR does. We haven't figured how non-mobile build works without this logic so it's kinda bandaid for now.

This PR doesn't try to address (Hypothetical) Issue 2 as I haven't seen it. PR #26477 can be replaced by #26908 + this PR but we can keep it until M2.x is no longer true.

- Ultimate solution:
After Variable and Tensor are completely merged: #23032 then is_variable() checks can be changed to requires_grad() checks and all problems will be solved. We can clean up these hacks by then.

- References:
* Effect of `AutoNonVariableTypeMode`: all `is_variable()` inside current thread scope returns false:
  https://github.com/pytorch/pytorch/blob/master/c10/core/TensorImpl.h#L811

* Effect of `as_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/VariableTypeUtils.h#L159
  It calls `make_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/variable.h#L539

Test Plan: - Load and run MobileNetV2 fp32 & int8 models.

Differential Revision: D17595179

Pulled By: ljk53

fbshipit-source-id: ed417ba6b696d722ea04fe18adf6b38ababa6b7c
@facebook-github-bot facebook-github-bot deleted the gh/ljk53/50/head branch October 28, 2019 22:16
pdlive215 pushed a commit to pdlive215/pytorch that referenced this pull request Nov 27, 2019
Summary:
- This PR together with pytorch#26908 attempt to address issue pytorch#26764 (`Issue 1` mentioned below).

- Current flow without USE_STATIC_DISPATCH (for server build):
```
S1. jit::load()
  a. JIT calls variable_factories.h methods to instantiate variable instances.

  b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale.
    b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher.
    b.2 ATen dispatcher dispatches to corresponding VariableType methods.
    b.3 VariableType method uses `AutoNonVariableTypeMode` guard before calling into ATen implementation, as ATen generally expects `CHECK(!is_variable())`.
    b.4 VariableType method uses `as_variable` to wrap the results.

  x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after S1.a / S1.b.

S2. module::forward()
  a. JIT interpreter calls some ATen methods (via JIT registry).
    a.1 - a.4: same as S1.b.1 - S1.b.4.

  x. Different from S1.x, seems JIT doesn't expect `CHECK(is_variable())` during the entire `forward()` call.
```

- Current flow with USE_STATIC_DISPATCH (for mobile build):
```
M1. jit::load()
  a. JIT calls variable_factories.h methods to instantiate variable instances.

  b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale.
    b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher.
    b.2 ATen dispatcher dispatches to corresponding ATen implementation directly.
      // Issue 1: NO VariableType methods / `AutoNonVariableTypeMode` so `CHECK(!is_variable())` in ATen will fail!
      // (Hypothetical) Issue 2: NO `as_variable()` to wrap result as variable. M1.x will fail if is ever used to check this result.

  x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after M1.a / M1.b.

M2. module::forward() // PR pytorch#26477 wraps this call with `AutoNonVariableTypeMode` guard.
  a. JIT interpreter calls some ATen methods (via JIT registry).
    a.1 same as M1.b.1, calls into register_aten_ops_xxx.cpp.
    a.2 same as M1.b.2, calls ATen implementation directly.
      // `CHECK(!is_variable())` in ATen won't fail thanks to the outer scope `AutoNonVariableTypeMode` guard.

  x. Same as above, seems JIT never expects `CHECK(is_variable())` during the entire `forward()` call.
```

- Wrong solution: if we wrap M1 with `AutoNonVariableTypeMode`, it will solve `Issue 1` for some models but will fail M1.x for some other models.

- Proposed solution:
I feel the root cause is that mobile build doesn't have `VariableType` as a barrier sitting between JIT and ATen to convert between is_variable() and !is_variable().

Without `VariableType` the best alternative place to put a barrier is M1.b.2 as Edward did in pytorch#26908.

For some reason we also need toggle variable state for c10 ops: this is what this PR does. We haven't figured how non-mobile build works without this logic so it's kinda bandaid for now.

This PR doesn't try to address (Hypothetical) Issue 2 as I haven't seen it. PR pytorch#26477 can be replaced by pytorch#26908 + this PR but we can keep it until M2.x is no longer true.

- Ultimate solution:
After Variable and Tensor are completely merged: pytorch#23032 then is_variable() checks can be changed to requires_grad() checks and all problems will be solved. We can clean up these hacks by then.

- References:
* Effect of `AutoNonVariableTypeMode`: all `is_variable()` inside current thread scope returns false:
  https://github.com/pytorch/pytorch/blob/master/c10/core/TensorImpl.h#L811

* Effect of `as_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/VariableTypeUtils.h#L159
  It calls `make_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/variable.h#L539

Test Plan: - Load and run MobileNetV2 fp32 & int8 models.

Differential Revision: D17595179

Pulled By: ljk53

fbshipit-source-id: ed417ba6b696d722ea04fe18adf6b38ababa6b7c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: android Related to Android support oncall: jit Add this issue/PR to JIT oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants