[JIT] remove builtin interpolate functions #34514

eellison · 2020-03-09T23:12:20Z

torch.nn.functional.interpolate was written as a builtin op when we scripted the standard library, because it has four possible overloads. As a result, whenever we make a change to interpolate, we need to make changes in two places, and it also makes it impossible to optimize the interpolate op. The builtin is tech debt.

I talked with @ailzhang, and the symbolic script changes are good to remove (i guess that makes a third place we needed to re-implement interpolate).

I'm trying to get rid of unneccessary builtin operators because we're standardizing mobile bytecode soon, so we should try to get this landed as soon as possible.

dr-ci · 2020-03-09T23:19:54Z

💊 CircleCI build failures summary and remediations

As of commit fd82d4e (more details on the Dr. CI page):

2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following build failures do not appear to be due to upstream breakages (reran 2 jobs to discount flakiness):

pytorch_linux_xenial_py3_clang5_mobile_custom_build_static (1/2)

Step: "Set Up CI Environment After attach_workspace" (full log | pattern match details) <confirmed not flaky by 2 failures>

E: Failed to fetch https://download.docker.com/linux/ubuntu/dists/xenial/stable/binary-amd64/Packages.bz2  Hash Sum mismatch

                                                                  96% [39 Packages store 0 B]                             Get:53 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 Packages [16.8 kB] 
                                                               96% [39 Packages store 0 B]                             Get:54 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse Translation-en [8,468 B] 
96% [Waiting for headers] 96% [40 Translation-en store 0 B] [Waiting for headers]                                                         Get:55 http://archive.ubuntu.com/ubuntu xenial-backports/main Sources [4,848 B] 
                                                                    96% [40 Translation-en store 0 B]                                   Get:56 http://archive.ubuntu.com/ubuntu xenial-backports/universe Sources [7,120 B] 
                                                                    96% [40 Translation-en store 0 B]                                   Get:57 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages [7,280 B] 
                                                                     96% [40 Translation-en store 0 B]                                   Get:58 http://archive.ubuntu.com/ubuntu xenial-backports/main Translation-en [4,456 B] 
                                                                           96% [40 Translation-en store 0 B]                                   Get:59 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [8,064 B] 
                                                                     96% [40 Translation-en store 0 B]                                   Get:60 http://archive.ubuntu.com/ubuntu xenial-backports/universe Translation-en [4,328 B] 
100% [60 Translation-en store 0 B]                                4,813 kB/s 0s 100% [Working]                                                    4,813 kB/s 0s                                                                                 Fetched 28.9 MB in 6s (4,801 kB/s) 
Reading package lists... 99%  Reading package lists... Done  
E: Failed to fetch https://download.docker.com/linux/ubuntu/dists/xenial/stable/binary-amd64/Packages.bz2  Hash Sum mismatch 
E: Some index files failed to download. They have been ignored, or old ones used instead.

caffe2_onnx_main_py3_6_clang7_ubuntu16_04_build (2/2)

Step: "Set Up CI Environment After attach_workspace" (full log | pattern match details) <confirmed not flaky by 2 failures>

E: Failed to fetch https://download.docker.com/linux/ubuntu/dists/xenial/stable/binary-amd64/Packages.bz2  Hash Sum mismatch

                                                                  96% [39 Packages store 0 B]                             Get:53 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 Packages [16.8 kB] 
96% [Waiting for headers] 96% [40 Translation-en store 0 B] [Waiting for headers]                                                         Get:54 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse Translation-en [8,468 B] 
                                                                           96% [40 Translation-en store 0 B]                                   Get:55 http://archive.ubuntu.com/ubuntu xenial-backports/main Sources [4,848 B] 
                                                                    96% [40 Translation-en store 0 B]                                   Get:56 http://archive.ubuntu.com/ubuntu xenial-backports/universe Sources [7,120 B] 
                                                                    96% [40 Translation-en store 0 B] [Waiting for headers]                                                         Get:57 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages [7,280 B] 
                                                                     96% [40 Translation-en store 0 B]                                   Get:58 http://archive.ubuntu.com/ubuntu xenial-backports/main Translation-en [4,456 B] 
                                                                           96% [40 Translation-en store 0 B]                                   Get:59 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [8,064 B] 
                                                                     96% [40 Translation-en store 0 B]                                   Get:60 http://archive.ubuntu.com/ubuntu xenial-backports/universe Translation-en [4,328 B] 
                                   100% [Working]                Fetched 28.9 MB in 5s (4,865 kB/s) 
Reading package lists... 99%  Reading package lists... Done  
E: Failed to fetch https://download.docker.com/linux/ubuntu/dists/xenial/stable/binary-amd64/Packages.bz2  Hash Sum mismatch 
E: Some index files failed to download. They have been ignored, or old ones used instead.

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 59 times.

driazati · 2020-03-10T00:25:31Z

test/test_jit.py

-    ('interpolate', torch.zeros(3, 3, 3).view(1, 1, 3, 3, 3), (2,), 'trilinear_5d', (True, 'aten::__interpolate')),
-    ('interpolate', torch.randn(S, M, M, M, M), (None, 2.), 'trilinear_5d_with_scale', (True, 'aten::__interpolate')),
-    ('interpolate', torch.randn(S, M, M, M, M), (4,), 'trilinear_5d_with_size', (True, 'aten::__interpolate')),
+    ('interpolate', torch.zeros(3, 3).view(1, 1, 3, 3), (2,), 'nearest_4d', (False, 'aten::__interpolate')),


What is the value that changed here and why

per @ailzhang can be removed

driazati · 2020-03-10T00:34:58Z

torch/nn/functional.py

-        if size is None and scale_factor is None:
-            raise ValueError('either size or scale_factor should be defined')
-        if size is not None and scale_factor is not None:
-            raise ValueError('only one of size or scale_factor should be defined')


What happened to this error message? We should try to preserve the original behavior as much as possible

i thought they were saying the same thing, but i guess they're slightly different. readded

driazati · 2020-03-10T00:35:21Z

torch/nn/functional.py

+    # type: (int, Tuple[Tensor, Optional[List[int]], Optional[float], Optional[bool]]) -> List[int]
+    pass
+
+def _interp_output_size(dim, closed_over_args):  # noqa: F811


This function is kind of long, can you add an overload for _check_size_scale_factor as well?

Because of overload decls, i would have to add 16 lines to separate out these 5 error checking lines, i don't think it's worth it, can do it if you think that's worth it.

ailzhang · 2020-03-10T17:11:47Z

test/test_jit.py

-    ('interpolate', torch.randn(S, M, M, M, M), (None, 2.), 'trilinear_5d_with_scale', (True, 'aten::__interpolate')),
-    ('interpolate', torch.randn(S, M, M, M, M), (4,), 'trilinear_5d_with_size', (True, 'aten::__interpolate')),
+    ('interpolate', torch.zeros(3, 3).view(1, 1, 3, 3), (2,), 'nearest_4d', (False, 'aten::__interpolate')),
+    ('interpolate', torch.randn(S, S, M, M), (None, 2.), 'nearest_4d_with_scale', (False, 'aten::__interpolate')),


False is the default value and it no longer need to check nodes in the differentiated graph. You can safely remove the last tuple (False, 'aten::__interpolate') for all these lines.

jerryzh168 · 2020-03-10T23:12:49Z

torch/csrc/jit/passes/quantization.cpp

-    // TODO: sort returns a tuple of Tensors, we have
-    // to extend the API to support that
-    // "sort",
-    "__interpolate",


could you add the corresponding ops to this list? I think it might be better to just put this change in the same PR

I don't know the difference between "single_input_call_funcs" and "single_input_aten_funcs"

Where should I be putting the ops?

jerryzh168 · 2020-03-11T17:05:46Z

"single_input_call_funcs" means functions compiled to CallFunction(xxx) like functional linear, single_input_aten_funcs means functions compiled to aten::xxx, like Conv2d

Sent from my iPhone On Mar 11, 2020, at 10:02, eellison <notifications@github.com> wrote: @eellison commented on this pull request.

________________________________ In torch/csrc/jit/passes/quantization.cpp<#34514 (comment)>:

@@ -136,20 +136,16 @@ bool isFunctionNode(Node* n,

// the quantization parameters for output given inputs std::vector<size_t> getGeneralOpTensorInputIndexes(Node* n) { std::vector<std::string> single_input_aten_funcs = { - "adaptive_avg_pool2d", - "max_pool2d", - "avg_pool2d", - "flatten", - "max", - "min", - "mean", - // TODO: sort returns a tuple of Tensors, we have - // to extend the API to support that - // "sort", - "__interpolate", Where should I be putting the ops? — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub<#34514 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABF2R2M2EYH7TETBNCNKQCLRG67YNANCNFSM4LETHMDQ>.

jerryzh168 · 2020-03-11T17:10:22Z

I suppose they are compiled to aten ops, if that’s the case we should put them in the same list as previous __interpolate

Sent from my iPhone On Mar 11, 2020, at 10:02, eellison <notifications@github.com> wrote: @eellison commented on this pull request.

________________________________ In torch/csrc/jit/passes/quantization.cpp<#34514 (comment)>:

@@ -136,20 +136,16 @@ bool isFunctionNode(Node* n,

// the quantization parameters for output given inputs std::vector<size_t> getGeneralOpTensorInputIndexes(Node* n) { std::vector<std::string> single_input_aten_funcs = { - "adaptive_avg_pool2d", - "max_pool2d", - "avg_pool2d", - "flatten", - "max", - "min", - "mean", - // TODO: sort returns a tuple of Tensors, we have - // to extend the API to support that - // "sort", - "__interpolate", Where should I be putting the ops? — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub<#34514 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABF2R2M2EYH7TETBNCNKQCLRG67YNANCNFSM4LETHMDQ>.

facebook-github-bot

@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

neginraoof · 2020-03-11T20:20:34Z

Hi @eellison, I'm trying to look into changes breaking onnx tests.
Just testing an interpolate layer in scripting like:

@torch.jit.script_method
def forward(self, x):
return torch.nn.functional.interpolate(x, mode=self.mode, size=self.size)

Is now producing a very large ir graph.
This was previously shown in the graph as a single aten interpolate node.
Do you know if this change is expected? And does this affect the performance for model scripting?

eellison · 2020-03-11T21:05:22Z

Hi @eellison, I'm trying to look into changes breaking onnx tests.
Just testing an interpolate layer in scripting like:

@torch.jit.script_method
def forward(self, x):
return torch.nn.functional.interpolate(x, mode=self.mode, size=self.size)

Is now producing a very large ir graph.
This was previously shown in the graph as a single aten interpolate node.
Do you know if this change is expected? And does this affect the performance for model scripting?

Yes this is expected. We previously hacked it in as a builtin node, and are now representing it as its python code. In the short term it may be marginally slower, but in the long term will be faster, as we do a better job of optimizing away the non-Tensor ops, and potentially do codegen for the aten ops it invokes. It will also be more maintanable as now tracing & scripting creates the same ops, and we do not need 4 different implementations of interpolate (register_prim_ops, functional.py, symbolic_script, onnx_export -> functional.py).

In your example mode will pretty much always be known at compile time. self.size as a known constant is also more realistic here, because with the profiling executor that will become a constant.

def forward(self, x):
    return torch.nn.functional.interpolate(x, mode=self.mode, size=self.size)

driazati

torchscript and nn changes look fine

facebook-github-bot

@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

neginraoof · 2020-03-11T21:41:02Z

@eellison
I see now. So then even setting the constants in this case just cuts the graph by a few lines.
I can see all the interpolate functional module scripted into an inflated graph of about 900 nodes large.
With all the If blocks and branches.
I don't know if it makes sense to export such a large model for an op though, and how much this could be optimized.
Do you have a measure of how this impacts the scripting perf?

eellison · 2020-03-11T21:45:54Z

@eellison
I see now. So then even setting the constants in this case just cuts the graph by a few lines.
I can see all the interpolate functional module scripted into an inflated graph of about 900 nodes large.
With all the If blocks and branches.
Do you have a measure of how this impacts the scripting perf?

It's not just a question of perf, but of maintainability. I've talked with other members of TorchScript team and vetted this change as it pertains to scripting.

neginraoof · 2020-03-11T22:00:49Z

@eellison Thanks.
cc @houseroad
I'm mainly concerned about how this will impact the efficiency of the onnx model after this change.
The export for scripted interpolate op will be highly inefficient.
About the optimizations that you mentioned, do you know if these optimizations are going to be part of the torch ir graph? And will those be visible to onnx?

eellison · 2020-03-11T22:08:38Z

@neginraoof it doesn't affect interrpolate tracing, which is vast majority of ONNX usage. I also said at the time interpolate script onnx export was implemented that we were going to remove "aten::__interpolate", and suggested that it be a requisite. If there is serious concerns about torch.nn.interpolate, someone can try to move the op to be a natively declared aten builtin, but the current duplication is not sustainable.

neginraoof · 2020-03-11T22:28:57Z

@eellison Do you know that is the future optimization plan? Are we going to have codegen for aten ops? Or is there another component optimizing the torch ir graph? I'm trying to understand whether this could be used to improve the ONNX graph as well.

eellison · 2020-03-11T22:55:22Z

@eellison Do you know that is the future optimization plan? Are we going to have codegen for aten ops? Or is there another component optimizing the torch ir graph? I'm trying to understand whether this could be used to improve the ONNX graph as well.

As it stands the plan is just to more aggressively optimize python idioms. I have a WIP pr that gets interpolate down to a single executed op. If someone thought it was worth the effort they could move "interpolate" to be a native aten function, but no one up to this point has thought it worth the cost/gain tradeoff.

eellison · 2020-03-12T17:51:50Z

For anyone investigating breakage, I opened up disabled test here: #34658

facebook-github-bot · 2020-03-12T22:22:23Z

@eellison merged this pull request in 514cba0.

jerryzh168 · 2020-03-13T03:43:10Z

test/test_jit.py

                   .check("aten::max") \
                   .check("aten::min") \
                   .check("aten::mean") \
-                   .check("aten::__interpolate") \


@eellison we need the tests as well, actually I think this might have broken these tests, will sync with you tomorrow.

…erpolate (#35744) Summary: Since aten;:__interpolate is removed in #34514, we need a pass replace interpolate function with aten::__interpolate for ONNX export. Pull Request resolved: #35744 Reviewed By: hl475 Differential Revision: D20907041 Pulled By: houseroad fbshipit-source-id: f2d2cdfec47389245c50f538267124eedf682adf

remove builtin interpolate functions

a9a1bf4

eellison requested review from ailzhang, driazati and lara-hdr March 9, 2020 23:12

eellison requested a review from apaszke as a code owner March 9, 2020 23:12

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 9, 2020

driazati reviewed Mar 10, 2020

View reviewed changes

ailzhang reviewed Mar 10, 2020

View reviewed changes

eellison added 2 commits March 10, 2020 11:37

address comments

36d9fc0

quant changes

387de23

eellison requested review from driazati and jerryzh168 March 10, 2020 22:31

jerryzh168 reviewed Mar 10, 2020

View reviewed changes

eellison added 2 commits March 11, 2020 10:15

add interp ops to quant list

7f97b78

skip interpolate script and fix lint

4d2ed1e

facebook-github-bot reviewed Mar 11, 2020

View reviewed changes

fix tests

fd82d4e

facebook-github-bot reviewed Mar 11, 2020

View reviewed changes

driazati approved these changes Mar 11, 2020

View reviewed changes

facebook-github-bot reviewed Mar 11, 2020

View reviewed changes

facebook-github-bot closed this in 514cba0 Mar 12, 2020

facebook-github-bot added the merged label Mar 12, 2020

jerryzh168 reviewed Mar 13, 2020

View reviewed changes

neginraoof mentioned this pull request Mar 13, 2020

ONNX export of nn.interpolate scripted module is broken due to removal of aten::__interpolate op #34718

Closed

houseroad mentioned this pull request Mar 13, 2020

[JIT] disable test #34722

Closed

yf225 mentioned this pull request Apr 2, 2020

fix is_float_scale_factor warning (python and c++) #35601

Closed

neginraoof mentioned this pull request Apr 13, 2020

[ONNX] Adding a pass to replace interpolate function with aten::__interpolate #35744

Closed

mruberry added the Merged label Oct 28, 2020

[JIT] remove builtin interpolate functions #34514

[JIT] remove builtin interpolate functions #34514

Uh oh!

Conversation

eellison commented Mar 9, 2020

Uh oh!

dr-ci bot commented Mar 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

🕵️ 2 new failures recognized by patterns

pytorch_linux_xenial_py3_clang5_mobile_custom_build_static (1/2)

caffe2_onnx_main_py3_6_clang7_ubuntu16_04_build (2/2)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Mar 11, 2020 via email

Uh oh!

jerryzh168 commented Mar 11, 2020 via email

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

neginraoof commented Mar 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eellison commented Mar 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

driazati left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

neginraoof commented Mar 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eellison commented Mar 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neginraoof commented Mar 11, 2020

Uh oh!

eellison commented Mar 11, 2020

Uh oh!

neginraoof commented Mar 11, 2020

Uh oh!

eellison commented Mar 11, 2020

Uh oh!

eellison commented Mar 12, 2020

Uh oh!

facebook-github-bot commented Mar 12, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

dr-ci bot commented Mar 9, 2020 •

edited

Loading

neginraoof commented Mar 11, 2020 •

edited

Loading

eellison commented Mar 11, 2020 •

edited

Loading

neginraoof commented Mar 11, 2020 •

edited

Loading

eellison commented Mar 11, 2020 •

edited

Loading