Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize Optional[T] to T (or subtype for Tensor) or None when executing graph #18407

Closed
wants to merge 26 commits into from

Conversation

@t-vi
Copy link
Collaborator

@t-vi t-vi commented Mar 24, 2019

This patch specializes Optional[Tensor] graph inputs to either a DimensionedTensorType (if a Tensor is passed) or NoneType. Other Optional[T] are specialized to T or None.

Functional changes

  • For unwrapping (checked and unchecked) we need to keep the output type, as IR code that follows unwrapping may not work with NoneType (just as it doesn't deal with Optional). While it would not be hit during execution, it will run against the (legitimate) assumptions of the analysis passes.
  • Function lookup currently will not match NoneType when it expects optional (I'm not entirely sure why this doesn't lead to unhappyness currently, but hey), I amend this at the level of the function matching code (operator.cpp), but see Adam's comments. We would run into trouble if we needed to select between functions whose signature only differs in Optional types with different subtypes, but we would have the same problem when calling them directly, so I would think this is OK.

Advantages:

  • It would enable throwing away branches we can't hit. This also reduces the "blockyness" of the graph, so it may be easier to apply optimizations (e.g. fuse things in if t is None: ... and outside the if.
  • Arguments passed into Optional[Tensor] arguments will get shape information, which is very handy.
  • It get's rid of the problem that tensors passed into Optional arguments get requires_grad set erroneously #18270 (though that also affects lists, which aren't fixed here).
  • Optional[List[int]] is needed for #18697.

Disadvantages:

  • We're changing typing in a more subtle way than the TensorType->DimensionedTensorType.
  • In particular, specializing to NoneType loses the Type information captured in the OptionalType element type.
In pytorch#18360, we used undefined Tensor (aka AutogradZeroTensor),
but this can be errorprone when the type or value is compared
to None, e.g. as seen when comined with the (not yet landed)

For this to work, we must allow None passed to functions
taking Tensor?.
torch/csrc/jit/fuser/fallback.cpp Outdated Show resolved Hide resolved
torch/csrc/jit/passes/shape_analysis.cpp Outdated Show resolved Hide resolved
torch/csrc/jit/operator.cpp Outdated Show resolved Hide resolved
torch/csrc/jit/passes/shape_analysis.cpp Show resolved Hide resolved
torch/csrc/jit/passes/shape_analysis.cpp Outdated Show resolved Hide resolved
torch/csrc/jit/passes/specialize_autogradzero.cpp Outdated Show resolved Hide resolved
@t-vi t-vi changed the title Specialize Optional (Tensor) to None when executing graph [WIP] Specialize Optional (Tensor) to None when executing graph Mar 24, 2019
@t-vi
Copy link
Collaborator Author

@t-vi t-vi commented Mar 24, 2019

Given the rather fundamental nature of the shortcomings of this approach, I'd have a tendency to drop it.

facebook-github-bot added a commit that referenced this issue Mar 25, 2019
…" (#18411)

Summary:
This reverts commit 7cc7ed1.

I think it's better to sort out the issues raised in #18407 firs. I'm sorry for not stopping it earlier.
Pull Request resolved: #18411

Differential Revision: D14594937

Pulled By: soumith

fbshipit-source-id: 3c90b7fa7694e2f59e55607acecde4a47af801ea
@t-vi t-vi changed the title [WIP] Specialize Optional (Tensor) to None when executing graph Specialize Optional (Tensor) to None when executing graph Mar 25, 2019
@eellison
Copy link
Contributor

@eellison eellison commented Apr 2, 2019

Is this the relevant PR for specializing Optional tensors as graph input?

@t-vi
Copy link
Collaborator Author

@t-vi t-vi commented Apr 2, 2019

Copy link
Contributor

@eellison eellison left a comment

Looks good to me, just a few minor nits. I think someone with a better knowledge of the specialize autogradzero pass should review before we land @apaszke @zdevito

test/test_jit.py Outdated Show resolved Hide resolved
test/test_jit.py Outdated Show resolved Hide resolved
torch/csrc/jit/argument_spec.h Outdated Show resolved Hide resolved
torch/csrc/jit/ir.cpp Outdated Show resolved Hide resolved
torch/csrc/jit/passes/peephole.cpp Outdated Show resolved Hide resolved
torch/csrc/jit/passes/shape_analysis.cpp Outdated Show resolved Hide resolved
torch/csrc/jit/passes/shape_analysis.cpp Outdated Show resolved Hide resolved
torch/csrc/jit/operator.cpp Outdated Show resolved Hide resolved
@eellison
Copy link
Contributor

@eellison eellison commented Apr 2, 2019

I'm not sure that this case will necessarily be hit but just to be safe could you modify the operator of aten::_unwrap_optionnal so that it checks for the undefined tensor case ?

Edit: if we only encounter undefined tensors in the backward pass than maybe it's not needed

Copy link
Contributor

@apaszke apaszke left a comment

I think I finally understand where a lot of confusion is coming from, and why AD has regressed in weird ways. See my comment in argument_spec.h.

torch/csrc/jit/operator.cpp Outdated Show resolved Hide resolved
torch/csrc/jit/passes/shape_analysis.cpp Show resolved Hide resolved
torch/csrc/jit/passes/shape_analysis.cpp Outdated Show resolved Hide resolved
torch/csrc/jit/argument_spec.h Outdated Show resolved Hide resolved
torch/csrc/jit/ir.cpp Outdated Show resolved Hide resolved
@t-vi
Copy link
Collaborator Author

@t-vi t-vi commented Apr 3, 2019

So after merging master (including the ArgumentSpecCreator), I do distinguish between undefined tensor and None by having an extra flag is_none_ instead of abusing defined_. As the is_tensor_ is dropped, I added a new is_nontensor_optional_ (so that zeroing the struct will cover the tensor case).
I actually have the non-tensor case (that used to be part of #18697) here, maybe they're better here.

Note that before this patch, Optional[Tensor] hadn't been specialized at all, now we specialize into cases for the various dimensions and None. On the other hand Optional[Tuple[...]] and Optional[Class[...]] only looked at in terms of None / Value, not in terms of their contents (but that wasn't the case before, either, and I don't want to complicate things even more).

@t-vi t-vi changed the title Specialize Optional (Tensor) to None when executing graph Specialize Optional[T] to T or None when executing graph Apr 4, 2019
@zdevito zdevito self-requested a review Apr 16, 2019
@zdevito
Copy link
Contributor

@zdevito zdevito commented Apr 18, 2019

@t-vi I've been swamped this week, so I haven't gotten to look at this change again, but I just wanted to let you know I haven't forgotten about and will try to get to it as soon as I can.

Copy link
Contributor

@zdevito zdevito left a comment

This looks good! Thanks for the patience. I just have one minor question about the autogradzero optional, and one suggestion, otherwise this is ready to go.

case SPECIALIZE_OPTIONAL_TENSOR: {
auto& input_type = *input_stack.back()++;
auto is_present = spec.isPresent(optional_arg_spec_offset++);
auto& arg = spec.tensorAt(tensor_arg_spec_offset++);
Copy link
Contributor

@zdevito zdevito Apr 24, 2019

It doesn't seem like it is necessary to push/pop a tensor from the tensor specializations if it wasn't present. It isn't actually used. It also means that addTensor doesn't have to have logic for serializing a tensor that is undefined.

Copy link
Collaborator Author

@t-vi t-vi Apr 25, 2019

I fixed that, thanks!

auto& arg = spec.tensorAt(tensor_arg_spec_offset++);
if (!is_present) {
result_stack.back().emplace_back(input_type);
} else if (!arg.defined()) {
Copy link
Contributor

@zdevito zdevito Apr 24, 2019

I am confused about this case. How are we generating Optional tensors that contain autograd zero tensors? I though our autograd zero tensors are not marked as having an optional type?

Copy link
Collaborator Author

@t-vi t-vi Apr 24, 2019

So for an earlier iteration @apaszke had the example of a function returning an Optional[Tensor]. On the other hand, if we now specialize Optional[Tensor], we might not need it...

Copy link
Collaborator Author

@t-vi t-vi Apr 25, 2019

Seems OK without, so I replaced it with an assert.

torch/csrc/jit/passes/shape_analysis.cpp Show resolved Hide resolved
and assert that Optional tensors cannot be undefined.

Based on Zach's feedback. Thank you!
@t-vi
Copy link
Collaborator Author

@t-vi t-vi commented May 2, 2019

If there is anything I can do to move this forward, I'd appreciate if you gave me a shout. Thank you!

Copy link
Contributor

@eellison eellison left a comment

Looks like the comment has been addressed

torch/csrc/jit/argument_spec.cpp Outdated Show resolved Hide resolved
AT_ASSERT(spec.size() == graph_inputs.size());
for (size_t i = 0; i < graph_inputs.size(); ++i) {
graph_inputs[i]->setType(spec.at(i));
ArgumentSpecCreator arg_spec_creator(*graph);
Copy link
Contributor

@wanchaol wanchaol May 3, 2019

seems like this switching from completeArgSpec to only propagate dimension tensor information rather than complete tensors. I guess shape analysis works fine in this case, but does this means that we are missing the complete tensor type information? I would imagine if the following passes need the complete information then they could not find it.

Copy link
Collaborator Author

@t-vi t-vi May 3, 2019

Wouldn't the comment and code below this get back to CompleteTensorType and propagate those?

Copy link
Contributor

@wanchaol wanchaol May 3, 2019

oops you are right.. I was thinking incompleteInferTypeFrom does not get the complete information..

@t-vi
Copy link
Collaborator Author

@t-vi t-vi commented May 3, 2019

So I'm not super-happy with not getting a specialized graph for the failed script:

pytorch/test/test_jit.py

Lines 5566 to 5571 in dcbfa67

with self.assertRaisesRegex(RuntimeError, "Unwrapping null optional"):
res = fn(None, 2.0)
g = torch.jit.last_executed_optimized_graph()
# FIXME: I would love to see if the graph has the right output
if g is not None:
self.assertEqual(next(g.outputs()).type().str(), "Tensor")

Copy link
Contributor

@wanchaol wanchaol left a comment

lgtm

AT_ASSERT(spec.size() == graph_inputs.size());
for (size_t i = 0; i < graph_inputs.size(); ++i) {
graph_inputs[i]->setType(spec.at(i));
ArgumentSpecCreator arg_spec_creator(*graph);
Copy link
Contributor

@wanchaol wanchaol May 3, 2019

oops you are right.. I was thinking incompleteInferTypeFrom does not get the complete information..

@t-vi
Copy link
Collaborator Author

@t-vi t-vi commented May 4, 2019

So now I'm happy with the PyTorch test, but the caffe JIT CI tests are acting up. :(

@t-vi
Copy link
Collaborator Author

@t-vi t-vi commented May 5, 2019

So finally it's all green again. :)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

@wanchaol has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@eellison
Copy link
Contributor

@eellison eellison commented May 6, 2019

oops didnt see u had imported it @wanchaol

@eellison eellison dismissed stale reviews from zdevito and apaszke May 6, 2019

changes addressed

@facebook-github-bot
Copy link
Contributor

@facebook-github-bot facebook-github-bot commented May 6, 2019

@eellison merged this pull request in 5c9ab6f.

facebook-github-bot added a commit that referenced this issue May 24, 2019
Summary:
This PR is a eliminates unneeded grad_sum_to_size and in particular speeds up the LSTM backward by allowing better fusion.

It consists of two parts:
- In AutoDiff, record broadcasting sizes only if the broadcast output size is different from the input size, otherwise record None.
- The specialization of Optional arguments (#18407) allows us to then eliminate ` _grad_sum_to_size(t, None)` in the peephole optimization   step.

Thus, in the LSTM case, no SumToSize remain in the crucial fusion group. The trick here is that we can specialize on the runtime information from the forward.

I'm testing that different broadcasting situations lead to different graphs.

I didn't move all symbolic_script _grad_sum_to_size to the new logic, but it might be better to do this incrementally, anyway.
Pull Request resolved: #18697

Differential Revision: D15482076

Pulled By: wanchaol

fbshipit-source-id: 7f89367e35b8729910077c95c02bccefc8678afb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants