You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/api/kernel.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -90,6 +90,8 @@ result = bound_static(torch.randn(100, 50)) # Must be exactly [100, 50]
90
90
91
91
```{warning}
92
92
Helion shape-specializes kernels by default (`static_shapes=True`) for the best performance. Bound kernels and caches require tensors with the exact same shapes and strides as the examples you compile against. Set `static_shapes=False` if you need the same compiled kernel to serve many shapes.
93
+
With dynamic shapes (`static_shapes=False`), Helion also specializes on if a tensor shape is 0 or 1 and whether a tensor needs 64-bit indexing (more than ``2**31 - 1`` elements).
94
+
This 64-bit indexing specialization can be avoided by setting `index_dtype=torch.int64`.
93
95
```
94
96
95
97
### BoundKernel Methods
@@ -139,6 +141,10 @@ Kernels are automatically cached based on:
139
141
140
142
By default (`static_shapes=True`), Helion treats shapes and strides as compile-time constants, baking them into generated Triton code for the best performance. To reuse a single compiled kernel across size variations, set `static_shapes=False`, which instead buckets each dimension as `{0, 1, ≥2}` and allows more inputs to share the same cache entry.
141
143
144
+
```{note}
145
+
Dynamic buckets also track whether any tensor exceeds the ``torch.int32`` indexing limit so that cache entries diverge as soon as large inputs show up. If your deployment regularly touches that regime, pin ``index_dtype=torch.int64`` on the kernel to avoid a cache miss or limit errors.
|``HELION_INDEX_DTYPE``|``index_dtype``| Choose the default index dtype (accepts any ``torch.<dtype>`` name, e.g. ``int64``). |
264
+
|``HELION_INDEX_DTYPE``|``index_dtype``| Choose the index dtype (accepts any ``torch.<dtype>`` name, e.g. ``int64``), or set to ``auto``/unset to allow Helion to pick ``int32`` vs ``int64`` based on input sizes. |
263
265
|``HELION_STATIC_SHAPES``|``static_shapes``| Set to ``0``/``false`` to disable global static shape specialization. |
264
266
|``HELION_PERSISTENT_RESERVED_SMS``|``persistent_reserved_sms``| Reserve this many streaming multiprocessors when launching persistent kernels (``0`` uses all available SMs). |
265
267
|``HELION_FORCE_AUTOTUNE``|``force_autotune``| Force the autotuner to run even when explicit configs are provided. |
0 commit comments