Fix size_hint call sites failing on unbacked SymInts #110520

aakhundov · 2023-10-04T15:16:36Z

Stack from ghstack (oldest at bottom):

-> Fix size_hint call sites failing on unbacked SymInts #110520

Summary: Unbacked SymInts can't get a sizevars.size_hint due to being data-dependent. #109893 has added a new fallback parameter to sizevars.size_hint to specify the fallback value in cases like unbacked SymInt. In this PR we add more of those.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler

Summary: Unbacked SymInts can't get a `sizevars.size_hint` due to being data-dependent. #109893 has added a new `fallback` parameter to `sizevars.size_hint` to specify the fallback value in cases like unbacked SymInt. In this PR we add more of those. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

pytorch-bot · 2023-10-04T15:16:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110520

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

MacOS outage (2023-10-13)

✅ No Failures

As of commit 960577b with merge base 898482f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Unbacked SymInts can't get a `sizevars.size_hint` due to being data-dependent. #109893 has added a new `fallback` parameter to `sizevars.size_hint` to specify the fallback value in cases like unbacked SymInt. In this PR we add more of those. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler [ghstack-poisoned]

Summary: Unbacked SymInts can't get a `sizevars.size_hint` due to being data-dependent. #109893 has added a new `fallback` parameter to `sizevars.size_hint` to specify the fallback value in cases like unbacked SymInt. In this PR we add more of those. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c3044c0 Pull Request resolved: #110520

eellison

We should be constraining the hint with the value range if it exists, cc @ezyang.

ezyang · 2023-10-04T17:32:48Z

torch/_inductor/dependencies.py

@@ -49,7 +49,7 @@ def numbytes_hint(self):
            for var, size in zip(self.var_names, self.size):
                if var in vars:
                    numel = numel * size
-        return V.graph.sizevars.size_hint(numel) * get_dtype_size(
+        return V.graph.sizevars.size_hint(numel, fallback=8192) * get_dtype_size(


I don't think this is sound, see

def can_inplace(self, read_dep: dependencies.MemoryDep): if self.get_aliases() or self.is_template(): return False if read_dep.name not in self.scheduler.name_to_node: # don't allow reuse of an 'input' buffer, we don't own it # (would this have been fixed if I tracked mutations properly above?) return False if not isinstance( self.node, (torch._inductor.ir.AllReduce, torch._inductor.ir.InPlaceHint) ): # TODO make this a property of the IR return False if len(self.read_writes.writes) == 1: write_dep = next(iter(self.read_writes.writes)) return read_dep.numbytes_hint() == write_dep.numbytes_hint() return False

We must know for sure that the sizes are the same to inplace, so we shouldn't fallback, we should reject inplacing (or maybe do it symbolically?)

Is the problem you're referring to specifically related to can_inplace (in which case we can, indeed, reject inplacing in can_inplace in the face of unbacked SymInts)? Or are there other cases where using the fallback=8192 inside numbytes_hint can cause problems (in which case we can make fallback=None a parameter of numbytes_hint, propagate it to size_hint, and set it to 8192 only from unproblematic places)?

can_inplace is what needs fixing. But it looks like this function is actually the sole user of numbytes_hint, soooooo

ezyang · 2023-10-04T17:33:58Z

torch/_inductor/ir.py

@@ -1769,7 +1769,7 @@ def _dynamic_reshape_indexer(old_size, new_size):
        """
        Perform a reshape entirely by modifying indexing math
        """
-        size_hint = V.graph.sizevars.size_hint
+        size_hint = partial(V.graph.sizevars.size_hint, fallback=8192)


This looks pretty sus. Can we just disable dynamic reshape indexing when things are unbacked?

Could you please clarify what "disable" means in this context? I.e., I see dynamic_reshape_indexer being used in a few places in the IR like this:

https://github.com/pytorch/pytorch/blob/main/torch/_inductor/ir.py?#L996

or here

https://github.com/pytorch/pytorch/blob/main/torch/_inductor/ir.py?#L1705

and there doesn't seem to be obvious ways to get around those calls. Or should we invoke some other (static) reshape indexer within dynamic_reshape_indexer if we see unbacked SymInts?

Hmm, I am not too sure. @jansel any ideas?

The dynamic_reshape_indexer is used to implement views.

I think we could just call .realize() as a fallback if there are unbacked symints.

ezyang

The latter two changes look OK, but the first two seem trouble.

aakhundov · 2023-10-04T18:44:51Z

We should be constraining the hint with the value range if it exists, cc @ezyang.

@eellison do you think this is universal enough to put it into size_hits (e.g., by clipping the fallback by min / max if those exist)?

eellison · 2023-10-04T21:36:23Z

@aakhundov yea you could clip it within the size hint api

Summary: Unbacked SymInts can't get a `sizevars.size_hint` due to being data-dependent. #109893 has added a new `fallback` parameter to `sizevars.size_hint` to specify the fallback value in cases like unbacked SymInt. In this PR we add more of those. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler [ghstack-poisoned]

Summary: Unbacked SymInts can't get a `sizevars.size_hint` due to being data-dependent. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5459567 Pull Request resolved: #110520

aakhundov · 2023-10-09T16:28:27Z

@ezyang @eellison Updates since the last time:

Clip the fallback (int) value in the size_hint(expr, fallback) by the value range for the expr if available.
Ditch the fallback in numbytes_hint and fix the call sites in the scheduler.py instead.
Ditch the fallback in _dynamic_reshape_indexer and realize the input if there are unbacked SymInts in input or output size of the view op.
Replace the fallback in expand by a check for x.get_size() (an outer one around the check recently added for sizes).

The only fallback remaining is in the mm_config.

Please take a look. Thanks!

Summary: Unbacked SymInts can't get a `sizevars.size_hint` due to being data-dependent. #109893 has added a new `fallback` parameter to `sizevars.size_hint` to specify the fallback value in cases like unbacked SymInt. In this PR we add more of those. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler [ghstack-poisoned]

Summary: Unbacked SymInts can't get a `sizevars.size_hint` due to being data-dependent. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fa53b59 Pull Request resolved: #110520

ezyang · 2023-10-09T19:43:14Z

torch/_inductor/ir.py

+            if unbacked_symbols_in_sizes and (not is_contiguous_storage_and_layout(x)):
+                # realize x; otherwise, the dynamic_reshape_indexer below will fail
+                # due to the size_hint's inability to process unbacked SymInts
+                x.realize()


@Chillee and @aakhundov discussed this in chat, the thinking is that realize should be initially contiguous as long as no one makes it noncontiguous later.

ezyang · 2023-10-09T19:45:20Z

torch/_inductor/scheduler.py

+                return False
+            if write_dep.has_unbacked_symbols():
+                # can't inplace, as can't guarantee the exact size match
+                return False


We can probably make this a little better by doing an exact symbolic match, but this is fine for now.

Actually, if we take symbolic product of the read_dep and write_dep shapes with unbacked SymInts in them, and those two symbolic products turn out to be equal, this means that the buffer sizes are provably the same and we can allow in-placing? I this is the case, I think, it's not too hard to do.

On the other hand, I have doubts about the current model for backed SymInts. We're relying on the size hints of the numels, but size hints being the same doesn't necessarily mean that the shapes are the same? I guess, matching symbolic numels would be more reliable?

Actually, if we take symbolic product of the read_dep and write_dep shapes with unbacked SymInts in them, and those two symbolic products turn out to be equal, this means that the buffer sizes are provably the same and we can allow in-placing? I this is the case, I think, it's not too hard to do.

Exactly.

On the other hand, I have doubts about the current model for backed SymInts. We're relying on the size hints of the numels, but size hints being the same doesn't necessarily mean that the shapes are the same? I guess, matching symbolic numels would be more reliable?

Yeah, this seems a bit sus. Suppose that I have s0 and s1, and they happen to be the same this run. Why would reusing the buffer be OK if s0 and s1 later vary? Maybe we can construct a test case that forces a failure here...

ezyang · 2023-10-09T19:45:53Z

torch/_inductor/scheduler.py

@@ -1624,6 +1629,9 @@ def score_fusion_memory(self, node1, node2):
        common_memory_deps = (node1.read_writes.reads | node1.read_writes.writes) & (
            node2.read_writes.reads | node2.read_writes.writes
        )
+        common_memory_deps = {
+            dep for dep in common_memory_deps if not dep.has_unbacked_symbols()


For this one, wouldn't you prefer to use a fallback heuristic number?

I'm somewhat hesitant to use a heuristic number here: if the number is too large, this may lead to overoptimistic saving estimation; if too small, this likely won't make a substantial difference. So my thinking was to (somewhat conservatively) drop the dependencies with unbacked SymInts in them, as no reasonably close estimation can be done (like with backed SymInts). Does this make sense?

ezyang · 2023-10-09T19:47:26Z

torch/_inductor/lowering.py

+        # this cannot be done directly as below as we'll choke on the size_hint
+        # here
+        if x_size_product > 0 and not any(
+            V.graph.sizevars.shape_env.is_unbacked_symint(s) for s in sizes


Is this test redundant now?

I think, x.get_size() and sizes can generally be different in the expand op, with unbacked SymInts being present in any of them, but not the other? That's why I kept both tests.

you are right

ezyang · 2023-10-09T19:48:15Z

There are some suboptimal bits, but nothing that looks wrong. Thanks for doing this.

It would be really good to have tests for this, because this is from some internal model and we really don't want our only test coverage for this to be when we actually run it for real.

Summary: Unbacked SymInts can't get a `sizevars.size_hint` due to being data-dependent. #109893 has added a new `fallback` parameter to `sizevars.size_hint` to specify the fallback value in cases like unbacked SymInt. In this PR we add more of those. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler [ghstack-poisoned]

Summary: Unbacked SymInts can't get a `sizevars.size_hint` due to being data-dependent. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5608514 Pull Request resolved: #110520

Summary: Unbacked SymInts can't get a `sizevars.size_hint` due to being data-dependent. #109893 has added a new `fallback` parameter to `sizevars.size_hint` to specify the fallback value in cases like unbacked SymInt. In this PR we add more of those. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler [ghstack-poisoned]

Summary: Unbacked SymInts can't get a `sizevars.size_hint` due to being data-dependent. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: ca1940f Pull Request resolved: #110520

aakhundov · 2023-10-14T08:07:38Z

@pytorchbot merge

pytorchmergebot · 2023-10-14T08:09:36Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: Unbacked SymInts can't get a `sizevars.size_hint` due to being data-dependent. pytorch#109893 has added a new `fallback` parameter to `sizevars.size_hint` to specify the fallback value in cases like unbacked SymInt. In this PR we add more of those. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#110520 Approved by: https://github.com/jansel, https://github.com/ezyang

fix for #143498 We were incorrectly using contiguous strides for a non-contiguous tensor. There are two separate causes: 1. #110520 made it so we turn Views contiguous with unbacked symints becuase `dynamic_reshape_indexer below will fail due to the size_hint's inability to process unbacked SymInts`. Seems like we should fix. Regardless - it will make the input contiguous if input is unbacked to workaround this. 2. We weren't actually making it contiguous! I filed an issue for this here: #145561. This is still worth landing as a fix, even though we should those issues. Pull Request resolved: #145548 Approved by: https://github.com/desertfire

github-actions bot added module: inductor ciflow/inductor labels Oct 4, 2023

aakhundov requested review from jansel, ezyang and eellison October 4, 2023 15:16

aakhundov added topic: not user facing topic category ciflow/trunk Trigger trunk jobs on your pull request labels Oct 4, 2023

jansel approved these changes Oct 4, 2023

View reviewed changes

eellison reviewed Oct 4, 2023

View reviewed changes

ezyang reviewed Oct 4, 2023

View reviewed changes

ezyang requested changes Oct 4, 2023

View reviewed changes

pytorch deleted a comment from t10-13rocket Oct 5, 2023

aakhundov requested review from ezyang and eellison October 9, 2023 16:28

aakhundov changed the title ~~Add more size_hint fallbacks for unbacked SymInts~~ Fix size_hint call sites failing on unbacked SymInts Oct 9, 2023

ezyang reviewed Oct 9, 2023

View reviewed changes

ezyang approved these changes Oct 13, 2023

View reviewed changes

pytorchmergebot added the merging label Oct 14, 2023

pytorchmergebot added Merged and removed merging labels Oct 14, 2023

pytorchmergebot closed this in d7317d8 Oct 14, 2023

facebook-github-bot deleted the gh/aakhundov/1/head branch October 17, 2023 14:24

eellison mentioned this pull request Jan 24, 2025

fix unbacked + view incorrectness #145548

Closed

Fix size_hint call sites failing on unbacked SymInts #110520

Fix size_hint call sites failing on unbacked SymInts #110520

Uh oh!

Conversation

aakhundov commented Oct 4, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110520

❗ 1 Active SEVs

✅ No Failures

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang Oct 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aakhundov Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

aakhundov commented Oct 4, 2023

Uh oh!

eellison commented Oct 4, 2023

Uh oh!

aakhundov commented Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Oct 9, 2023

Uh oh!

aakhundov commented Oct 14, 2023

Uh oh!

pytorchmergebot commented Oct 14, 2023

Merge started

Uh oh!

Uh oh!

aakhundov commented Oct 4, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Oct 4, 2023 •

edited

Loading

ezyang Oct 5, 2023 •

edited

Loading

aakhundov Oct 4, 2023 •

edited

Loading

aakhundov commented Oct 9, 2023 •

edited

Loading