From 151a97e31ccfbaffa5b828992ff4dd29693d5523 Mon Sep 17 00:00:00 2001 From: Jason Ansel Date: Wed, 15 Oct 2025 14:40:03 -0700 Subject: [PATCH] Update static_shapes docs stack-info: PR: https://github.com/pytorch/helion/pull/951, branch: jansel/stack/196 --- docs/api/kernel.md | 8 ++++++-- docs/api/settings.md | 2 +- docs/deployment_autotuning.md | 31 +++++++++++++++++-------------- 3 files changed, 24 insertions(+), 17 deletions(-) diff --git a/docs/api/kernel.md b/docs/api/kernel.md index 652dc2d49..b8fe90c67 100644 --- a/docs/api/kernel.md +++ b/docs/api/kernel.md @@ -88,6 +88,10 @@ bound_static = shape_specialized_kernel.bind((torch.randn(100, 50),)) result = bound_static(torch.randn(100, 50)) # Must be exactly [100, 50] ``` +```{warning} +Helion shape-specializes kernels by default (`static_shapes=True`) for the best performance. Bound kernels and caches require tensors with the exact same shapes and strides as the examples you compile against. Set `static_shapes=False` if you need the same compiled kernel to serve many shapes. +``` + ### BoundKernel Methods The returned BoundKernel has these methods: @@ -131,9 +135,9 @@ print(triton_code) Kernels are automatically cached based on: - **Argument types** (dtype, device) -- **Tensor shapes** (when using `static_shapes=True`) +- **Tensor shapes** (default: `static_shapes=True`) -By default (`static_shapes=False`), kernels only specialize on basic shape categories (0, 1, or ≥2 per dimension) rather than exact shapes, allowing the same compiled kernel to handle different tensor sizes efficiently. +By default (`static_shapes=True`), Helion treats shapes and strides as compile-time constants, baking them into generated Triton code for the best performance. To reuse a single compiled kernel across size variations, set `static_shapes=False`, which instead buckets each dimension as `{0, 1, ≥2}` and allows more inputs to share the same cache entry. ```python # These create separate cache entries diff --git a/docs/api/settings.md b/docs/api/settings.md index aad01b9f3..13945adfc 100644 --- a/docs/api/settings.md +++ b/docs/api/settings.md @@ -98,7 +98,7 @@ with helion.set_default_settings( .. autoattribute:: Settings.static_shapes - When enabled, tensor shapes are treated as compile-time constants for optimization. Default is ``False``. + When enabled, tensor shapes are treated as compile-time constants for optimization. Default is ``True``. Set this to ``False`` if you need a single compiled kernel instance to serve many shape variants. ``` ### Autotuning Settings diff --git a/docs/deployment_autotuning.md b/docs/deployment_autotuning.md index 8b254669c..0ec5a6b37 100644 --- a/docs/deployment_autotuning.md +++ b/docs/deployment_autotuning.md @@ -146,13 +146,16 @@ config and selecting the fastest. A key detail here is controlling the specialization key, which determines when to re-benchmark. Options include: -- **Default (dynamic shapes):** we reuse the timing result as long as -tensor dtypes and device types stay constant. Shape changes only trigger -a re-selection when a dimension size crosses the buckets `{0, 1, ≥2}`. +- **Default (`static_shapes=True`):** Helion shape-specializes on the exact + shape/stride signature, rerunning the selection whenever those shapes + differ. This delivers the best per-shape performance but requires all calls + to match the example shapes exactly. -- **`static_shapes=True`:** add this setting to the decorator to specialize -on the exact shape/stride signature, rerunning the selection whenever -those shapes differ. +- **`static_shapes=False`:** switch to bucketed dynamic shapes. Helion + reuses results as long as tensor dtypes and device types stay constant. + Shape changes only trigger a re-selection when a dimension size crosses + the buckets `{0, 1, ≥2}`. Use this when you need one compiled kernel to + handle many input sizes. - **Custom keys:** pass `key=` to group calls however you like. This custom key is in addition to the above. @@ -197,15 +200,15 @@ input types. You can pre-compile as many configs as you need using `BoundKernel.compile_config`. **Warning:** `kernel.bind()` specializes, and the result will only work with the same input types you passed. -- With `static_shapes=False` (default) it will specialize on the input -dtypes, device types, and whether each dynamic dimension falls into the -0, 1, or ≥2 bucket. Python types are also specialized. For dimensions -that can vary across those buckets, supply representative inputs ≥2 -to avoid excessive specialization. +- With `static_shapes=True` (default) the bound kernel only works for the +exact shape/stride signature of the example inputs. The generated code +has shapes baked in, which often provides a performance boost. -- With `static_shapes=True` the bound kernel only works for the exact -shape/stride signature of the example inputs. The generated code will -have shapes baked in, which often provides a performance boost. +- With `static_shapes=False` it will specialize on the input dtypes, +device types, and whether each dynamic dimension falls into the 0, 1, +or ≥2 bucket. Python types are also specialized. For dimensions that +can vary across those buckets, supply representative inputs ≥2 to avoid +excessive specialization. If you need to support multiple input types, bind multiple times with representative inputs.