Value range refinement using multi-variate expressions. #97964

ysiraichi · 2023-03-30T07:02:40Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2023-03-30T07:02:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97964

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d98aa5e:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 61af062 Pull Request resolved: #97964

ysiraichi · 2023-03-30T07:19:35Z

As is, the only difference (benchmark-wise) is that pytorch_unet goes from 9/73/73 (univariate only) to 9/70/70. Still have to run the whole benchmark set, though.

…ns." [ghstack-poisoned]

…ions." [ghstack-poisoned]

ghstack-source-id: 3af054b Pull Request resolved: #97964

lezcano

In this case, when refining the bounds, it could be the case that rhs_vr.lower/upper are variables. In that case, the refinement could then be lower = max(lower, rhs_vr.lower), right?

lezcano

At any rate, this needs some testing.

github-actions · 2023-05-30T10:33:49Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

…ns." [ghstack-poisoned]

ghstack-source-id: 54ece1d Pull Request resolved: #97964

…i-variate expressions." [ghstack-poisoned]

…lue range refinement using multi-variate expressions." [ghstack-poisoned]

ghstack-source-id: 25d88c9 Pull Request resolved: #97964

…xpressions." [ghstack-poisoned]

ghstack-source-id: a27545c Pull Request resolved: #97964

…nt using multi-variate expressions." [ghstack-poisoned]

… expressions." [ghstack-poisoned]

…)" This reverts commit 2642412. Reverted #97964 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it is breaking an internal test ([comment](#97964 (comment)))

ysiraichi · 2023-07-02T12:57:16Z

@ezyang @huydhn
Best guess I have is: multi-variate range refinement is not sound. Could you try running it while setting torch._dynamo.config.translation_validation = True? Or, is there an example I can try to reproduce the error?

…dation off on timeouts." Follow-up to PR: #97964 After the introduction of translation validation, (TV) a few TIMM and TorchBench benchmarks started failing due to TIMEOUT. This PR turns TV off for them. cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

…meouts." Follow-up to PR: #97964 After the introduction of translation validation, (TV) a few TIMM and TorchBench benchmarks started failing due to TIMEOUT. This PR turns TV off for them. cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

Follow-up to PR: #97964 After the introduction of translation validation, (TV) a few TIMM and TorchBench benchmarks started failing due to TIMEOUT. This PR turns TV off for them. ghstack-source-id: 0e731c4 Pull Request resolved: #104464

lezcano · 2023-07-02T19:41:43Z

This implementations looks good to me, so I don't think it's "not sound" by itself.

My gut tells me that the issue is lurking in the value range analysis. I already found a number of bugs in it, so I wouldn't be entirely surprised that there were more issues lurking there.

… timeouts." Follow-up to PR: #97964 After the introduction of translation validation, (TV) a few TIMM and TorchBench benchmarks started failing due to TIMEOUT. This PR turns TV off for them. cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

Follow-up to PR: #97964 After the introduction of translation validation, (TV) a few TIMM and TorchBench benchmarks started failing due to TIMEOUT. This PR turns TV off for them. cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

…uts." Follow-up to PR: #97964 After the introduction of translation validation, (TV) a few TIMM and TorchBench benchmarks started failing due to TIMEOUT. This PR turns TV off for them. cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

Follow-up to PR: #97964 After the introduction of translation validation, (TV) a few TIMM and TorchBench benchmarks started failing due to TIMEOUT. This PR turns TV off for them. ghstack-source-id: f2f817d Pull Request resolved: #104464

Follow-up to PR: #97964 After the introduction of translation validation, (TV) a few TIMM and TorchBench benchmarks started failing due to TIMEOUT. This PR turns TV off for them. cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

ysiraichi · 2023-07-04T10:49:55Z

@lezcano I think the translation validator should be able to catch that.

…uts." Follow-up to PR: #97964 After the introduction of translation validation, (TV) a few TIMM and TorchBench benchmarks started failing due to TIMEOUT. This PR turns TV off for them. cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

Follow-up to PR: #97964 After the introduction of translation validation, (TV) a few TIMM and TorchBench benchmarks started failing due to TIMEOUT. This PR turns TV off for them. cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

Follow-up to PR: #97964 After the introduction of translation validation, (TV) a few TIMM and TorchBench benchmarks started failing due to TIMEOUT. This PR turns TV off for them. ghstack-source-id: e8698a0 Pull Request resolved: #104464

Follow-up to PR: #97964 After the introduction of translation validation, (TV) a few TIMM and TorchBench benchmarks started failing due to TIMEOUT. This PR turns TV off for them. Pull Request resolved: #104464 Approved by: https://github.com/malfet

ysiraichi · 2023-07-10T15:24:28Z

@ezyang @huydhn
This is a friendly reminder. If you have some time, could you try and run the failing test with TV on?

ezyang · 2023-07-12T15:23:18Z

going to attempt this, might be tricky because i'm not sure we have z3 on our internal

ezyang · 2023-07-13T14:55:42Z

Patching this

diff --git a/fbcode/caffe2/torch/_dynamo/config.py b/fbcode/caffe2/torch/_dynamo/config.py
--- a/fbcode/caffe2/torch/_dynamo/config.py
+++ b/fbcode/caffe2/torch/_dynamo/config.py
@@ -237,7 +237,7 @@
 numpy_ndarray_as_tensor = False
 
 # Uses z3 for validating the guard optimizations transformations.
-translation_validation = False
+translation_validation = True
 
 
 def is_fbcode():

and rerunning the test, I still get the same error as before:

  File "/data/users/ezyang/fbsource/buck-out/v2/gen/fbcode/8c7462494077df89/pye/model_inventory/inside_out_tracking_model/__inside_out_tracking_model_test__/
inside_out_tracking_model_test#link-tree/torch/_dynamo/eval_frame.py", line 1012, in graph_with_interpreter                                                      return torch.fx.Interpreter(graph).run(*args)                                                                                                            
  File "/data/users/ezyang/fbsource/buck-out/v2/gen/fbcode/8c7462494077df89/pye/model_inventory/inside_out_tracking_model/__inside_out_tracking_model_test__/
inside_out_tracking_model_test#link-tree/torch/fx/interpreter.py", line 138, in run                                                                              self.env[node] = self.run_node(node)                                                                                                                     
  File "/data/users/ezyang/fbsource/buck-out/v2/gen/fbcode/8c7462494077df89/pye/model_inventory/inside_out_tracking_model/__inside_out_tracking_model_test__/
inside_out_tracking_model_test#link-tree/torch/fx/interpreter.py", line 195, in run_node                                                                         return getattr(self, n.op)(n.target, args, kwargs)                                                                                                       
  File "/data/users/ezyang/fbsource/buck-out/v2/gen/fbcode/8c7462494077df89/pye/model_inventory/inside_out_tracking_model/__inside_out_tracking_model_test__/
inside_out_tracking_model_test#link-tree/torch/fx/interpreter.py", line 267, in call_function                                                                
    return target(*args, **kwargs)                                                                                                                           
IndexError: tuple index out of range                                                                                                                         
                                                                                                                                                             
While executing %getitem_6 : [num_users=1] = call_function[target=operator.getitem](args = (%chunk, 2), kwargs = {})

which suggests that we don't get to TV (which makes sense, TV only runs all the way at the end of execution, but we're erroring midway through.

I have a live session failing, lmk what kind of debugging instrumentation you want.

ysiraichi · 2023-07-13T17:50:11Z

@ezyang

This is really odd. AFAIU, this PR modifies var_to_range which affects what guards are issued. So, if there was a problem with it, I would expect it to be executing a wrong compiled function (i.e. we should not have eliminated some guards).

Just to be completely sure, could you try commenting these lines at the end of refine_ranges and check that it passes the test?

# Updates the range and the guards corresponding to each bound of the symbol.
self.var_to_range[symbol] = ValueRanges(lower, upper)
self.var_to_guards[symbol] = (lower_guard, upper_guard)

ezyang · 2023-07-13T21:31:45Z

Applying this patch

diff --git a/fbcode/caffe2/torch/fx/experimental/symbolic_shapes.py b/fbcode/caffe2/torch/fx/experimental/symbolic_shapes.py
--- a/fbcode/caffe2/torch/fx/experimental/symbolic_shapes.py
+++ b/fbcode/caffe2/torch/fx/experimental/symbolic_shapes.py
@@ -3467,8 +3467,8 @@
                 continue
 
             # Updates the range and the guards corresponding to each bound of the symbol.
-            self.var_to_range[symbol] = ValueRanges(lower, upper)
-            self.var_to_guards[symbol] = (lower_guard, upper_guard)
+            #self.var_to_range[symbol] = ValueRanges(lower, upper)
+            #self.var_to_guards[symbol] = (lower_guard, upper_guard)
             # Clears the cache, since this update can change the result.
             self._maybe_evaluate_static.cache_clear()

and confirmed the test passes afterwards.

Trying to re-land: #97964. [ghstack-poisoned]

Trying to re-land: #97964. ghstack-source-id: 2a1b9dd Pull Request resolved: #105491

Trying to re-land: #97964. Test strategy: ``` buck2 test '@fbcode//mode/dev-nosan' fbcode//pye/model_inventory/inside_out_tracking_model:inside_out_tracking_model_test -- --exact 'pye/model_inventory/inside_out_tracking_model:inside_out_tracking_model_test - test_executorch_e2e_output_consistency_aten (pye.model_inventory.inside_out_tracking_model.InsideOutTrackingModelTest.InsideOutTrackingModelTest)' ``` Pull Request resolved: #105491 Approved by: https://github.com/ezyang

Value range refinement using multi-variate expressions.

14e5d6e

[ghstack-poisoned]

ysiraichi mentioned this pull request Mar 30, 2023

Value range refinement using uni-variate expressions. #97963

Closed

pytorch-bot bot added the release notes: fx release notes category label Mar 30, 2023

ysiraichi added a commit that referenced this pull request Mar 30, 2023

Value range refinement using multi-variate expressions.

a8d8246

ghstack-source-id: 61af062 Pull Request resolved: #97964

pytorchbot added the open source label Mar 30, 2023

ysiraichi changed the title ~~Value range refinement using multi-variate expressions.~~ [WIP] Value range refinement using multi-variate expressions. Mar 30, 2023

ysiraichi requested a review from ezyang March 30, 2023 07:17

Update on "[WIP] Value range refinement using multi-variate expressio…

6ef710b

…ns." [ghstack-poisoned]

Rebased. on "[WIP] Value range refinement using multi-variate express…

9e8e614

…ions." [ghstack-poisoned]

ysiraichi added a commit that referenced this pull request Mar 31, 2023

Value range refinement using multi-variate expressions.

19fa779

ghstack-source-id: 3af054b Pull Request resolved: #97964

lezcano reviewed Mar 31, 2023

View reviewed changes

github-actions bot added the Stale label May 30, 2023

lezcano removed the Stale label May 30, 2023

ysiraichi mentioned this pull request May 30, 2023

Translation validator for dynamo guards. #102563

Closed

Update on "[WIP] Value range refinement using multi-variate expressio…

0fee70a

…ns." [ghstack-poisoned]

ysiraichi added a commit that referenced this pull request May 30, 2023

Value range refinement using multi-variate expressions.

0cbfd4f

ghstack-source-id: 54ece1d Pull Request resolved: #97964

Fix + using sympy_interp. on "[WIP] Value range refinement using mult…

0f98cc6

…i-variate expressions." [ghstack-poisoned]

Temporarily turing translation validation on by default. on "[WIP] Va…

3069ed8

…lue range refinement using multi-variate expressions." [ghstack-poisoned]

ysiraichi added a commit that referenced this pull request Jun 2, 2023

Value range refinement using multi-variate expressions.

c011d6c

ghstack-source-id: 25d88c9 Pull Request resolved: #97964

Fix CI issues. on "[WIP] Value range refinement using multi-variate e…

adb40b0

…xpressions." [ghstack-poisoned]

ysiraichi added a commit that referenced this pull request Jun 7, 2023

Value range refinement using multi-variate expressions.

5e1b16e

ghstack-source-id: a27545c Pull Request resolved: #97964

Simplify FX graph creation + CI fixes. on "[WIP] Value range refineme…

a5d1678

…nt using multi-variate expressions." [ghstack-poisoned]

Fix lint errors. on "[WIP] Value range refinement using multi-variate…

5f58d55

… expressions." [ghstack-poisoned]

pytorchmergebot added the Reverted label Jun 30, 2023

facebook-github-bot deleted the gh/ysiraichi/54/head branch July 3, 2023 14:17

lezcano restored the gh/ysiraichi/54/head branch July 18, 2023 14:23

ysiraichi mentioned this pull request Jul 18, 2023

Reland: Value range refinement using multi-variate expressions #105491

Closed

ysiraichi added a commit that referenced this pull request Jul 18, 2023

Reland: Value range refinement using multi-variate expressions

573ee69

Trying to re-land: #97964. [ghstack-poisoned]

ysiraichi added a commit that referenced this pull request Jul 18, 2023

Reland: Value range refinement using multi-variate expressions

71b6799

Trying to re-land: #97964. ghstack-source-id: 2a1b9dd Pull Request resolved: #105491

facebook-github-bot deleted the gh/ysiraichi/54/head branch July 19, 2023 14:16

Value range refinement using multi-variate expressions. #97964

Value range refinement using multi-variate expressions. #97964

Uh oh!

Conversation

ysiraichi commented Mar 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97964

✅ No Failures

Uh oh!

ysiraichi commented Mar 30, 2023

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 30, 2023

Uh oh!

ysiraichi commented Jul 2, 2023

Uh oh!

lezcano commented Jul 2, 2023

Uh oh!

ysiraichi commented Jul 4, 2023

Uh oh!

ysiraichi commented Jul 10, 2023

Uh oh!

ezyang commented Jul 12, 2023

Uh oh!

ezyang commented Jul 13, 2023

Uh oh!

ysiraichi commented Jul 13, 2023

Uh oh!

ezyang commented Jul 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ysiraichi commented Mar 30, 2023 •

edited

Loading

pytorch-bot bot commented Mar 30, 2023 •

edited

Loading