-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Exposing linear layer to fuser #50856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jjsjann123
commented
Jan 21, 2021
- enabling linear in autodiff;
- remove control flow in python for linear;
1. enabling linear in autodiff; 2. remove control flow in python for linear;
Note to myself. onnx doesn't support linear. 🤦 |
hmmm, so onnx does support linear directly. I tried it with this tutorial: https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html Maybe it's the opset_version? I'll double check the script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything but the onnx pass looks ok to me.
I don't know how these passes are written so it would be good to have someone familiar with them review at least this part.
test/test_autograd.py
Outdated
last_end = 0 | ||
|
||
for event in prof.function_events: | ||
if event.name == 'aten::linear': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very arbitrary? Why do you need to special case linear here?
If there is a good reason, it would help to have a comment here explaining it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test here is somewhat hard-coded. top_level_expected_events_and_shapes
maps top level profiled kernel.
Right now since we expose aten::linear
, the old hard-coded names / shapes are not top level any more. So we are skipping aten::linear
. I'll put a note here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, this looks a little funky to me too, but i dont really know anything about it. cc @ilia-cher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very hacky and arbitrary, please make sure to fix the test the right way
what this test does: runs profiler, gets a list of function (operator) events - note that some ops are called by other ops, then we go through the events and look at the ones that don't have a parent (that is they are the top level events)
your PR as i understand changes the ops and thus changes the output of the profiler
test/test_jit.py
Outdated
# script module | ||
jit_t = torch.jit.script(t) | ||
|
||
x = x_init.detach().clone().requires_grad_() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the clone is not needed here right? (same on all the lines here)
} | ||
} | ||
|
||
static void decomposeLinear(Block* b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this tested anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was to hitting CI error on onnx export, where it complains aten::linear
not in the onnx opset.
After adding the decompose pass, the test passed.
I can add a standalone cpp test to verify the decomposition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want the deccomposition on by default, but we might consider adding it in a more similar way to https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/passes/decompose_ops.cpp#L195. Additionally, if there aren't any other use cases for decomposing linear (and i cant think of any of the top of my head), it might be easier to have this implemented as an onnx decomposition cc @BowenBao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inserted in PreprocessForONNX
and exposed in python via torch._C._jit_pass_onnx_preprocess
.
We didn't put it int decompose_ops.cpp
because we don't want the op decomposition for fuser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I just mean the API for doing decomposition there is a little bit easier. I think this is fine for landing, someone from ONNX might want to rewrite it later.
} // namespace | ||
|
||
void PreprocessForONNX(std::shared_ptr<Graph>& graph) { | ||
GRAPH_DUMP("priot to decompose linear", graph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like debug prints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are fine i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything that I can read looks good to me.
You just need someone to take a look at the jit pass.
} // namespace | ||
|
||
void PreprocessForONNX(std::shared_ptr<Graph>& graph) { | ||
GRAPH_DEBUG("priot to decompose linear", graph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm not familiar with these macros. Is it ok to leave them in the release code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is, they don't print by default and we have them elsewhere in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!! This lgtm but i wanted to get input on the profiler change before landing
} | ||
} | ||
|
||
static void decomposeLinear(Block* b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want the deccomposition on by default, but we might consider adding it in a more similar way to https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/passes/decompose_ops.cpp#L195. Additionally, if there aren't any other use cases for decomposing linear (and i cant think of any of the top of my head), it might be easier to have this implemented as an onnx decomposition cc @BowenBao
} // namespace | ||
|
||
void PreprocessForONNX(std::shared_ptr<Graph>& graph) { | ||
GRAPH_DEBUG("priot to decompose linear", graph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is, they don't print by default and we have them elsewhere in the code.
test/test_autograd.py
Outdated
last_end = 0 | ||
|
||
for event in prof.function_events: | ||
if event.name == 'aten::linear': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, this looks a little funky to me too, but i dont really know anything about it. cc @ilia-cher
test/test_autograd.py
Outdated
print(prof.function_events) | ||
|
||
top_level_expected_events_and_shapes = [ | ||
second_level_expected_events_and_shapes = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is "second level"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we explicitly want to profile the shape in transpose and addmm. 🤦
Will update.
test/test_autograd.py
Outdated
last_end = 0 | ||
|
||
for event in prof.function_events: | ||
if event.name == 'aten::linear': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very hacky and arbitrary, please make sure to fix the test the right way
what this test does: runs profiler, gets a list of function (operator) events - note that some ops are called by other ops, then we go through the events and look at the ones that don't have a parent (that is they are the top level events)
your PR as i understand changes the ops and thus changes the output of the profiler
Updated per comment @ilia-cher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
cc @ljk53 @bhosmer will this break internal mobile models that have the decomposed ops manually written out to be included ? I see that both vulkan and and xnn_pack rewrite the ops to aten::linear, so I'm not sure. This is a good example of when it would be nice if the operators were taken from the graph instead of manually specified by users. |
@eellison good question, and I'm not sure of the answer. @ljk53 @iseeyuan, do you guys know (or know who to ask)? |
from offline @iseeyuan, because aten::linear has already existed in builds for a while:
Awesome that ops are selectively built now ! |
('aten::linear', [[128, 20], [30, 20], [30]]), | ||
('aten::linear', [[128, 30], [40, 30], [40]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for updating
Reverting, broke a lot of builds
likely related to recently landed #49453 |
This pull request has been reverted by 26f9ac9. |
errr. looks like they just forgot to add an entry in derivatives for |
@jjsjann123 , you can apply the following patch to solve it:
|
Thanks. I'm rebuilding it with the fix. @XiaobingSuper |