[inductor] Implement bucketize() for dependencies.py #105102

davidberard98 · 2023-07-13T00:09:28Z

Stack from ghstack (oldest at bottom):

-> [inductor] Implement bucketize() for dependencies.py #105102

dependencies.py is used for tracking reads and writes, which is used for identifying dependencies between buffers: i.e. if buffer X reads buffer Y, then X depends on Y. ops.bucketize() reads from an offsets tensor, so we should track it in dependencies.py to correctly track dependencies. Since bucketize performs a binary search over the offsets tensor, the dependency is marked as a StarDep to indicate that the entire tensor is needed.

Use case: we find that jagged tensor dense_to_jagged ops - which use bucketize() to map jagged indices to dense indices - perform better if the bucketize() kernel is separated from the gather kernel. Previously, because bucketize() wasn't marked as reading anything, it would just get inlined.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov

Differential Revision: D47422704

dependencies.py is used for tracking reads and writes, which is used for identifying dependencies between buffers: i.e. if buffer X reads buffer Y, then X depends on Y. ops.bucketize() reads from an offsets tensor, so we should track it in dependencies.py to correctly track dependencies. Since bucketize performs a binary search over the offsets tensor, the dependency is marked as a StarDep to indicate that the entire tensor is needed. Use case: we find that jagged tensor dense_to_jagged ops - which use bucketize() to map jagged indices to dense indices - perform better if the bucketize() kernel is separated from the gather kernel. Previously, because bucketize() wasn't marked as reading anything, it would just get inlined. [ghstack-poisoned]

pytorch-bot · 2023-07-13T00:09:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/105102

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit d458577:

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base e68cf02:

👉 Rebase onto the `viable/strict` branch to avoid these failures

cuda11.8-py3.10-gcc7-sm86 / test (inductor_torchbench_dynamic, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh)

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

cuda11.8-py3.10-gcc7-sm86 / test (inductor_torchbench, 1, 1, linux.g5.4xlarge.nvidia.gpu, unstable) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

davidberard98 · 2023-07-13T00:15:23Z

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

dependencies.py is used for tracking reads and writes, which is used for identifying dependencies between buffers: i.e. if buffer X reads buffer Y, then X depends on Y. ops.bucketize() reads from an offsets tensor, so we should track it in dependencies.py to correctly track dependencies. Since bucketize performs a binary search over the offsets tensor, the dependency is marked as a StarDep to indicate that the entire tensor is needed. Use case: we find that jagged tensor dense_to_jagged ops - which use bucketize() to map jagged indices to dense indices - perform better if the bucketize() kernel is separated from the gather kernel. Previously, because bucketize() wasn't marked as reading anything, it would just get inlined. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 Differential Revision: [D47422704](https://our.internmc.facebook.com/intern/diff/D47422704) [ghstack-poisoned]

dependencies.py is used for tracking reads and writes, which is used for identifying dependencies between buffers: i.e. if buffer X reads buffer Y, then X depends on Y. ops.bucketize() reads from an offsets tensor, so we should track it in dependencies.py to correctly track dependencies. Since bucketize performs a binary search over the offsets tensor, the dependency is marked as a StarDep to indicate that the entire tensor is needed. Use case: we find that jagged tensor dense_to_jagged ops - which use bucketize() to map jagged indices to dense indices - perform better if the bucketize() kernel is separated from the gather kernel. Previously, because bucketize() wasn't marked as reading anything, it would just get inlined. ghstack-source-id: ef11962 Pull Request resolved: #105102

davidberard98 · 2023-07-13T02:06:43Z

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

aakhundov · 2023-07-13T10:27:07Z

torch/_inductor/codegen/triton.py

@@ -2409,7 +2410,7 @@ def candidate_tilings(node):
        deps = [
            dep
            for dep in itertools.chain(rw.reads, rw.writes)
-            if dep.name not in V.graph.removed_buffers
+            if dep.name not in V.graph.removed_buffers and isinstance(dep, MemoryDep)


Curious, why do we want to ignore other dep types than MemoryDep here?

Practically, because later we use dep.index which doesn't exist on StarDeps. Conceptually my understanding is that StarDep accesses also won't contribute suggestions for tiling since they don't have indexing information.

Added a comment

@EikanWang

dependencies.py is used for tracking reads and writes, which is used for identifying dependencies between buffers: i.e. if buffer X reads buffer Y, then X depends on Y. ops.bucketize() reads from an offsets tensor, so we should track it in dependencies.py to correctly track dependencies. Since bucketize performs a binary search over the offsets tensor, the dependency is marked as a StarDep to indicate that the entire tensor is needed. Use case: we find that jagged tensor dense_to_jagged ops - which use bucketize() to map jagged indices to dense indices - perform better if the bucketize() kernel is separated from the gather kernel. Previously, because bucketize() wasn't marked as reading anything, it would just get inlined. cc voznesenskym penguinwu @EikanWang jgong5 @Guobing-Chen @XiaobingSuper zhuhaozhe blzheng @Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 Differential Revision: [D47422704](https://our.internmc.facebook.com/intern/diff/D47422704) [ghstack-poisoned]

davidberard98 · 2023-07-13T17:41:43Z

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

eellison

nice! one question about StarDep vs MemDep

eellison · 2023-07-13T18:03:47Z

torch/_inductor/ir.py

            reads = [
                sympy_subs(
                    r.index, {v: sympy.Integer(0) for v in reduction_vars if v != 0}
                )
                for r in reads
+                if isinstance(r, dependencies.MemoryDep)


What prompted you to make this change, just curious ? I guess as above you got an error because StarDeps doesn't have index.

Yeah, that's why. I guess previously, ComputedBuffers never had StarDeps.

torch/_inductor/ir.py

eellison · 2023-07-13T18:09:45Z

torch/_inductor/dependencies.py

+        indexing_dtype: torch.dtype,
+        right: bool,
+    ):
+        self._reads.add(StarDep(offsets_name))


Why not :

self._writes.add(MemoryDep(offsets_name, *self.canonicalize(offsets_size))) ?

AFAIK, the index field is intended to refer to the indices that are accessed during the read. In this case, the binary search needs the raw offsets_ptr and then it will binary search over the offsets, so it actually does need the entire tensor and we don't actually know the indices it will need, because that's data-dependent.

offsets_size also isn't the index we're accessing, it's the full size of the tensor.

but I could be wrong about this interpretation, if so LMK

dependencies.py is used for tracking reads and writes, which is used for identifying dependencies between buffers: i.e. if buffer X reads buffer Y, then X depends on Y. ops.bucketize() reads from an offsets tensor, so we should track it in dependencies.py to correctly track dependencies. Since bucketize performs a binary search over the offsets tensor, the dependency is marked as a StarDep to indicate that the entire tensor is needed. Use case: we find that jagged tensor dense_to_jagged ops - which use bucketize() to map jagged indices to dense indices - perform better if the bucketize() kernel is separated from the gather kernel. Previously, because bucketize() wasn't marked as reading anything, it would just get inlined. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 Differential Revision: [D47422704](https://our.internmc.facebook.com/intern/diff/D47422704) [ghstack-poisoned]

davidberard98 · 2023-07-13T18:58:57Z

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

dependencies.py is used for tracking reads and writes, which is used for identifying dependencies between buffers: i.e. if buffer X reads buffer Y, then X depends on Y. ops.bucketize() reads from an offsets tensor, so we should track it in dependencies.py to correctly track dependencies. Since bucketize performs a binary search over the offsets tensor, the dependency is marked as a StarDep to indicate that the entire tensor is needed. Use case: we find that jagged tensor dense_to_jagged ops - which use bucketize() to map jagged indices to dense indices - perform better if the bucketize() kernel is separated from the gather kernel. Previously, because bucketize() wasn't marked as reading anything, it would just get inlined. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 Differential Revision: [D47422704](https://our.internmc.facebook.com/intern/diff/D47422704) [ghstack-poisoned]

davidberard98 · 2023-07-13T19:44:53Z

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

eellison · 2023-07-13T21:35:23Z

test/inductor/test_dependencies.py

+        super().tearDown()
+
+    @unittest.skipIf(not HAS_CUDA, "CUDA-only test")
+    def test_bucketize_dependencies(self):


maybe worth adding a test that tests the fusion you want to occur does occur, with run_and_get_code

good idea - I added a self.assertEqual(torch._inductor.metrics.generated_kernel_count, 1) into a bucketize -> add test in test_torchinductor.py

dependencies.py is used for tracking reads and writes, which is used for identifying dependencies between buffers: i.e. if buffer X reads buffer Y, then X depends on Y. ops.bucketize() reads from an offsets tensor, so we should track it in dependencies.py to correctly track dependencies. Since bucketize performs a binary search over the offsets tensor, the dependency is marked as a StarDep to indicate that the entire tensor is needed. Use case: we find that jagged tensor dense_to_jagged ops - which use bucketize() to map jagged indices to dense indices - perform better if the bucketize() kernel is separated from the gather kernel. Previously, because bucketize() wasn't marked as reading anything, it would just get inlined. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 Differential Revision: [D47422704](https://our.internmc.facebook.com/intern/diff/D47422704) [ghstack-poisoned]

dependencies.py is used for tracking reads and writes, which is used for identifying dependencies between buffers: i.e. if buffer X reads buffer Y, then X depends on Y. ops.bucketize() reads from an offsets tensor, so we should track it in dependencies.py to correctly track dependencies. Since bucketize performs a binary search over the offsets tensor, the dependency is marked as a StarDep to indicate that the entire tensor is needed. Use case: we find that jagged tensor dense_to_jagged ops - which use bucketize() to map jagged indices to dense indices - perform better if the bucketize() kernel is separated from the gather kernel. Previously, because bucketize() wasn't marked as reading anything, it would just get inlined. ghstack-source-id: a9f3b2c Pull Request resolved: #105102

davidberard98 · 2023-07-13T23:45:28Z

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

davidberard98 · 2023-07-14T17:12:32Z

@pytorchbot merge

This reverts commit cff5d6a. [ghstack-poisoned]

This reverts commit cff5d6a. ghstack-source-id: 5477531 Pull Request resolved: #105276

voznesenskym · 2023-07-17T01:02:46Z

@pytorchbot revert -m "Regresses perf all over the place" -c nosignal

pytorchmergebot · 2023-07-17T01:04:33Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2023-07-17T01:04:40Z

Can't revert PR that was landed via phabricator as D47422704. Please revert by going to the internal diff and clicking Unland.

facebook-github-bot · 2023-07-17T01:20:41Z

@pytorchbot revert -m="Diff reverted internally" -c="ghfirst"

This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).)

pytorchmergebot · 2023-07-17T01:22:13Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2023-07-17T01:22:23Z

@davidberard98 your PR has been successfully reverted.

This reverts commit cff5d6a. Reverted #105102 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](#105102 (comment)))

dependencies.py is used for tracking reads and writes, which is used for identifying dependencies between buffers: i.e. if buffer X reads buffer Y, then X depends on Y. ops.bucketize() reads from an offsets tensor, so we should track it in dependencies.py to correctly track dependencies. Since bucketize performs a binary search over the offsets tensor, the dependency is marked as a StarDep to indicate that the entire tensor is needed. Use case: we find that jagged tensor dense_to_jagged ops - which use bucketize() to map jagged indices to dense indices - perform better if the bucketize() kernel is separated from the gather kernel. Previously, because bucketize() wasn't marked as reading anything, it would just get inlined. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov Differential Revision: [D47422704](https://our.internmc.facebook.com/intern/diff/D47422704) [ghstack-poisoned]

… dependencies.py" dependencies.py is used for tracking reads and writes, which is used for identifying dependencies between buffers: i.e. if buffer X reads buffer Y, then X depends on Y. ops.bucketize() reads from an offsets tensor, so we should track it in dependencies.py to correctly track dependencies. Since bucketize performs a binary search over the offsets tensor, the dependency is marked as a StarDep to indicate that the entire tensor is needed. Use case: we find that jagged tensor dense_to_jagged ops - which use bucketize() to map jagged indices to dense indices - perform better if the bucketize() kernel is separated from the gather kernel. Previously, because bucketize() wasn't marked as reading anything, it would just get inlined. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov Differential Revision: [D47422704](https://our.internmc.facebook.com/intern/diff/D47422704) [ghstack-poisoned]

dependencies.py is used for tracking reads and writes, which is used for identifying dependencies between buffers: i.e. if buffer X reads buffer Y, then X depends on Y. ops.bucketize() reads from an offsets tensor, so we should track it in dependencies.py to correctly track dependencies. Since bucketize performs a binary search over the offsets tensor, the dependency is marked as a StarDep to indicate that the entire tensor is needed. Use case: we find that jagged tensor dense_to_jagged ops - which use bucketize() to map jagged indices to dense indices - perform better if the bucketize() kernel is separated from the gather kernel. Previously, because bucketize() wasn't marked as reading anything, it would just get inlined. ghstack-source-id: 9020586 Pull Request resolved: #105102

davidberard98 · 2023-07-17T04:35:59Z

whoops, looks like I was reusing the itertools.chain(rw.reads, rw.writes) in triton.py - on the second use the iterator was already exhausted so the deps list would be empty.

fixed by constructing the itertools.chain for each of its uses.

Verified with gluon_inception_v3 - before the fix this would regress from ~2x -> 0.9x speedup, after the fix we don't see the regression.

davidberard98 · 2023-07-17T19:13:13Z

@pytorchbot merge

pytorchmergebot · 2023-07-17T19:14:54Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

davidberard98 · 2023-07-18T17:11:38Z

fyi, verified no regression on the dashboard from the fixed version of this PR:

https://hud.pytorch.org/benchmark/compilers?startTime=Tue%2C%2011%20Jul%202023%2017%3A07%3A27%20GMT&stopTime=Tue%2C%2018%20Jul%202023%2017%3A07%3A27%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=gh/davidberard98/206/orig&lCommit=719b8524a3442f4341782233284b81f4ee5b2dfa&rBranch=davidberard98-206-base&rCommit=e68cf02420b852c3bf278fe9d1f47d6b02bae0f0

github-actions bot added module: inductor ciflow/inductor labels Jul 13, 2023

davidberard98 marked this pull request as ready for review July 13, 2023 04:44

davidberard98 requested review from jansel, eellison and aakhundov July 13, 2023 04:44

aakhundov reviewed Jul 13, 2023

View reviewed changes

eellison reviewed Jul 13, 2023

View reviewed changes

eellison approved these changes Jul 13, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 14, 2023

pytorchmergebot removed the merging label Jul 14, 2023

pytorchmergebot closed this in cff5d6a Jul 14, 2023

voznesenskym added a commit that referenced this pull request Jul 15, 2023

Revert "[inductor] Implement bucketize() for dependencies.py (#105102)"

1c7c5b2

This reverts commit cff5d6a. [ghstack-poisoned]

voznesenskym added a commit that referenced this pull request Jul 15, 2023

Revert "[inductor] Implement bucketize() for dependencies.py (#105102)"

f5e94de

This reverts commit cff5d6a. ghstack-source-id: 5477531 Pull Request resolved: #105276

pytorch deleted a comment from pytorch-bot bot Jul 17, 2023

pytorchmergebot added the Reverted label Jul 17, 2023

davidberard98 reopened this Jul 17, 2023

eellison approved these changes Jul 17, 2023

View reviewed changes

pytorchmergebot added the merging label Jul 17, 2023

pytorchmergebot removed the merging label Jul 17, 2023

pytorchmergebot closed this in 28d018d Jul 17, 2023

facebook-github-bot deleted the gh/davidberard98/206/head branch July 21, 2023 14:17

davidberard98 mentioned this pull request Jul 27, 2023

[inductor] realize boundaries in bucketize() lowering #106107

Closed

[inductor] Implement bucketize() for dependencies.py #105102

[inductor] Implement bucketize() for dependencies.py #105102

Uh oh!

Conversation

davidberard98 commented Jul 13, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/105102

❌ 2 New Failures, 2 Unrelated Failures

Uh oh!

davidberard98 commented Jul 13, 2023

Uh oh!

davidberard98 commented Jul 13, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidberard98 commented Jul 13, 2023

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidberard98 commented Jul 13, 2023

Uh oh!

davidberard98 commented Jul 13, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidberard98 commented Jul 13, 2023

Uh oh!

davidberard98 commented Jul 14, 2023

Uh oh!

voznesenskym commented Jul 17, 2023

Uh oh!

pytorchmergebot commented Jul 17, 2023

Uh oh!

pytorchmergebot commented Jul 17, 2023

Uh oh!

facebook-github-bot commented Jul 17, 2023

Uh oh!

pytorchmergebot commented Jul 17, 2023

Uh oh!

pytorchmergebot commented Jul 17, 2023

Uh oh!

davidberard98 commented Jul 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidberard98 commented Jul 17, 2023

Uh oh!

pytorchmergebot commented Jul 17, 2023

Merge started

Uh oh!

davidberard98 commented Jul 18, 2023

Uh oh!

Uh oh!

davidberard98 commented Jul 13, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 13, 2023 •

edited

Loading

davidberard98 commented Jul 17, 2023 •

edited

Loading