[halide-backend] Dimension-based indexing #129026

jansel · 2024-06-19T01:53:00Z

Stack from ghstack (oldest at bottom):

Prior to this the generated Halide code was a rather literal translation of the Triton code, with XBLOCK/YBLOCK/RBLOCK and 1D inputs. Halide prefers dimensions, and this 1D index triggers a lot of bugs and perf issues. This PR infers dimensions and changes the indexing in the generated code.

Before

@hl.generator(name="kernel")
class Kernel:
    in_ptr0 = hl.InputBuffer(hl.Float(32), 1)
    out_ptr3 = hl.OutputBuffer(hl.Float(32), 2)

    def generate(g):
        in_ptr0 = g.in_ptr0
        out_ptr3 = g.out_ptr3
        xindex = hl.Var('xindex')
        rindex = hl.Var('rindex')
        r1 = rindex
        x0 = xindex
        idom = hl.RDom([hl.Range(0, 16), hl.Range(0, 32)])
        odom = hl.RDom([hl.Range(0, 16)])
        rdom = hl.RDom([hl.Range(0, 32)])
        xindex_idom = idom.x
        xindex_odom = odom.x
        rindex_idom = idom.y
        r1_idom = rindex_idom
        x0_idom = xindex_idom
        x0_odom = xindex_odom
        tmp0 = hl.Func('tmp0')
        tmp0[rindex, xindex] = in_ptr0[r1 + (32*x0)]
        tmp1 = hl.Func('tmp1')
        tmp1[xindex] = hl.maximum(rdom, tmp0[rdom, xindex])
        tmp2 = hl.Func('tmp2')
        tmp2[rindex, xindex] = tmp0[rindex, xindex] - tmp1[xindex]
        tmp3 = hl.Func('tmp3')
        tmp3[rindex, xindex] = hl.fast_exp(hl.cast(hl.Float(32), tmp2[rindex, xindex])) if tmp2.type().bits() <= 32 else hl.exp(tmp2[rindex, xindex])
        tmp4 = hl.Func('tmp4')
        tmp4[xindex] = hl.sum(rdom, tmp3[rdom, xindex])
        tmp5 = hl.Func('tmp5')
        tmp5[rindex, xindex] = tmp3[rindex, xindex] / tmp4[xindex]
        out_ptr3_i0 = hl.Var('out_ptr3_i0')
        out_ptr3_i1 = hl.Var('out_ptr3_i1')
        out_ptr3[out_ptr3_i0, out_ptr3_i1] = hl.cast(out_ptr3.type(), tmp5[out_ptr3_i0, out_ptr3_i1])

        assert g.using_autoscheduler()
        in_ptr0.set_estimates([hl.Range(0, 512)])
        out_ptr3.set_estimates([hl.Range(0, 32), hl.Range(0, 16)])

After

@hl.generator(name="kernel")
class Kernel:
    in_ptr0 = hl.InputBuffer(hl.Float(32), 2)
    out_ptr3 = hl.OutputBuffer(hl.Float(32), 2)

    def generate(g):
        in_ptr0 = g.in_ptr0
        out_ptr3 = g.out_ptr3
        h0 = hl.Var('h0')
        h1 = hl.Var('h1')
        rdom = hl.RDom([hl.Range(0, 32)])
        hr1 = rdom[0]
        tmp0 = hl.Func('tmp0')
        tmp0[h0, h1] = in_ptr0[h0, h1,]
        tmp1 = hl.Func('tmp1')
        tmp1[h1] = hl.maximum(rdom, tmp0[hr1, h1])
        tmp2 = hl.Func('tmp2')
        tmp2[h0, h1] = tmp0[h0, h1] - tmp1[h1]
        tmp3 = hl.Func('tmp3')
        tmp3[h0, h1] = hl.fast_exp(hl.cast(hl.Float(32), tmp2[h0, h1])) if tmp2.type().bits() <= 32 else hl.exp(tmp2[h0, h1])
        tmp4 = hl.Func('tmp4')
        tmp4[h1] = hl.sum(rdom, tmp3[hr1, h1])
        tmp5 = hl.Func('tmp5')
        tmp5[h0, h1] = tmp3[h0, h1] / tmp4[h1]
        out_ptr3[h0, h1,] = hl.cast(hl.Float(32), tmp5[h0, h1])

        assert g.using_autoscheduler()
        in_ptr0.dim(0).set_min(0)
        in_ptr0.dim(0).set_stride(1)
        in_ptr0.dim(0).set_extent(32)
        in_ptr0.dim(1).set_min(0)
        in_ptr0.dim(1).set_stride(32)
        in_ptr0.dim(1).set_extent(16)
        in_ptr0.set_estimates([hl.Range(0, 32), hl.Range(0, 16)])
        out_ptr3.set_estimates([hl.Range(0, 32), hl.Range(0, 16)])

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-06-19T01:53:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129026

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ddfa273 with merge base bc8883a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: c51d8344d05f355c50fb5953e24e8b12f1866642 Pull Request resolved: pytorch#129026

[ghstack-poisoned]

shunting314 · 2024-06-21T22:04:57Z

torch/_inductor/codegen/halide.py

+
+    def __init__(self, expr, size, stride):
+        super().__init__()
+        if V.graph.sizevars.statically_known_leq(stride, 0):


hmm, when do we get a negative stride?

torch/_inductor/codegen/halide.py

shunting314 · 2024-06-21T22:11:50Z

torch/_inductor/codegen/halide.py

+        )
+        eq = V.graph.sizevars.statically_known_equals
+        lt = V.graph.sizevars.statically_known_lt
+        size_hint = functools.partial(V.graph.sizevars.size_hint, fallback=inf)


Should we use an integer rather than a float for the fallback value?

I want it to go last and there is no such thing as a max int in python

For our purposes, int64 max would work right ?

Yeah I suppose, seems like a style preference.

[ghstack-poisoned]

ghstack-source-id: 7398c9c226d63e13ef3eb2eccd21b086593da2c5 Pull Request resolved: pytorch#129026

eellison · 2024-06-24T19:35:22Z

torch/_inductor/codegen/halide.py

+                        try:
+                            code.writeline(
+                                f"{arg.name}.dim({i}).set_stride({int(dim.stride)})"
+                            )
+                        except TypeError:
+                            pass  # not integer
+                        try:
+                            code.writeline(
+                                f"{arg.name}.dim({i}).set_extent({int(dim.size)})"
+                            )
+                        except TypeError:
+                            pass  # not integer


Could query is_integer to avoid the try/except

It might be a regular int (not sympy)

eellison · 2024-06-24T19:38:21Z

torch/_inductor/codegen/halide.py

+        )
+        eq = V.graph.sizevars.statically_known_equals
+        lt = V.graph.sizevars.statically_known_lt
+        size_hint = functools.partial(V.graph.sizevars.size_hint, fallback=inf)


For our purposes, int64 max would work right ?

eellison · 2024-06-24T20:02:15Z

torch/_inductor/codegen/halide.py

        dtype = V.graph.get_dtype(name)
        if dtype in (torch.float16, torch.bfloat16):
+            dtype = torch.float32


nit: factor out to dtype_to_compute_dtype similar to triton codegen ?

eellison · 2024-06-24T20:13:30Z

torch/_inductor/codegen/halide.py

+            all_used_symbols.update(super().prepare_indexing(index).free_symbols)
+
+        had_fallback = False
+        for tree in reversed(self.range_trees):


nit: maybe factor out to helper function

[ghstack-poisoned]

Pull Request resolved: #127506 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026

Requires halide/Halide#8255 Pull Request resolved: #129036 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026, #127506

In theory Halide doesn't need the split reduction stuff we do for Triton since it can generate multiple kernels. Pull Request resolved: #129320 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026, #127506, #129036

Currently using this for some by-hand hacking, but might need to implement our own scheduler later. Pull Request resolved: #129321 Approved by: https://github.com/shunting314 ghstack dependencies: #126417, #129025, #129026, #127506, #129036, #129320

Update

d0d9583

[ghstack-poisoned]

This was referenced Jun 19, 2024

[inductor] Refactors for Halide backend #129024

Closed

[halide-backend] Initial implementation of HalideKernel and HalideScheduling #126417

Closed

jansel mentioned this pull request Jun 19, 2024

[halide-backend] Generate standalone runtime #129025

Closed

pytorch-bot bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor labels Jun 19, 2024

jansel mentioned this pull request Jun 19, 2024

[halide-backend] Add GPU support #127506

Closed

Update

9fd99c6

[ghstack-poisoned]

This was referenced Jun 19, 2024

[inductor] Run more test on correct device #129033

Closed

[inductor] Add --inductor-config benchmark flag #129034

Closed

[halide-backend] Support scan kernels #129035

Closed

[halide-backend] Enable bfloat16 support #129036

Closed

jansel added the release notes: inductor label Jun 19, 2024

jansel added 4 commits June 18, 2024 21:22

Update

e83a009

[ghstack-poisoned]

Update

0e12c81

[ghstack-poisoned]

Update

5496a59

[ghstack-poisoned]

Update

6914e42

[ghstack-poisoned]

jansel mentioned this pull request Jun 21, 2024

[benchmarking] Add join_results.py #129202

Closed

OnlyFor pushed a commit to OnlyFor/pytorch that referenced this pull request Jun 21, 2024

[halide-backend] Dimension-based indexing

051724d

ghstack-source-id: c51d8344d05f355c50fb5953e24e8b12f1866642 Pull Request resolved: pytorch#129026

jansel requested review from eellison, shunting314 and FindHao June 21, 2024 17:33

Update

dfc9ff6

[ghstack-poisoned]

shunting314 approved these changes Jun 21, 2024

View reviewed changes

jansel added 4 commits June 21, 2024 20:19

Update

8464435

[ghstack-poisoned]

Update

a9ba5cc

[ghstack-poisoned]

Update

08be113

[ghstack-poisoned]

Update

e14f414

[ghstack-poisoned]

jansel mentioned this pull request Jun 22, 2024

[halide-backend] Random number generation #129314

Draft

Update

95d5c3c

[ghstack-poisoned]

This was referenced Jun 23, 2024

[halide-backend] Disable split reductions for Halide #129320

Closed

[halide-backend] Support manual schedules #129321

Closed

OnlyFor pushed a commit to OnlyFor/pytorch that referenced this pull request Jun 23, 2024

[halide-backend] Dimension-based indexing

31f4d62

ghstack-source-id: 7398c9c226d63e13ef3eb2eccd21b086593da2c5 Pull Request resolved: pytorch#129026

eellison approved these changes Jun 24, 2024

View reviewed changes

jansel added 4 commits June 25, 2024 13:56

Update

72d62a5

[ghstack-poisoned]

Update

29c97b4

[ghstack-poisoned]

Update

5bc6039

[ghstack-poisoned]

Update

ddfa273

[ghstack-poisoned]

pytorchmergebot added the Merged label Jun 29, 2024

pytorchmergebot closed this in 86cadc6 Jun 29, 2024

pytorchmergebot pushed a commit that referenced this pull request Jun 29, 2024

[halide-backend] Add GPU support (#127506)

b93bf55

Pull Request resolved: #127506 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[halide-backend] Dimension-based indexing #129026

[halide-backend] Dimension-based indexing #129026

jansel commented Jun 19, 2024 •

edited

Loading

pytorch-bot bot commented Jun 19, 2024 •

edited

Loading

shunting314 Jun 21, 2024

jansel Jun 22, 2024

shunting314 Jun 21, 2024

jansel Jun 22, 2024

eellison Jun 24, 2024

jansel Jun 26, 2024

eellison Jun 24, 2024

jansel Jun 26, 2024

eellison Jun 24, 2024

eellison Jun 24, 2024

eellison Jun 24, 2024

[halide-backend] Dimension-based indexing #129026

[halide-backend] Dimension-based indexing #129026

Conversation

jansel commented Jun 19, 2024 • edited Loading

pytorch-bot bot commented Jun 19, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129026

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jansel commented Jun 19, 2024 •

edited

Loading

pytorch-bot bot commented Jun 19, 2024 •

edited

Loading