Benchmark CI crashing due to `RuntimeError: Triton Error [CUDA]: misaligned address` after autotuning finds the best config

Example job: https://github.com/pytorch/helion/actions/runs/17839047302/job/50723523136

```
 64%|██████▍   | 9/14 [29:20<19:36, 235.22s/it]W0918 21:09:35.299000 51876 torch/_dynamo/utils.py:1915] ChromiumEventLogger: Start event not in stack, ignoring

...

One can hardcode the best config and skip autotuning with:
    @helion.kernel(config=helion.Config(block_sizes=[1], indexing='block_ptr', num_stages=7, num_warps=16, pid_type='flat', range_flattens=[None], range_multi_buffers=[None], range_num_stages=[0], range_unroll_factors=[0], range_warp_specializes=[], reduction_loops=[32]))


 64%|██████▍   | 9/14 [33:15<18:28, 221.72s/it]
Caught exception, terminating early with partial results
Traceback (most recent call last):
  File "/__w/helion/helion/benchmarks/tritonbench/tritonbench/utils/triton_op.py", line 979, in run
    y_vals: Dict[str, BenchmarkOperatorMetrics] = functools.reduce(
                                                  ^^^^^^^^^^^^^^^^^
  File "/__w/helion/helion/benchmarks/tritonbench/tritonbench/utils/triton_op.py", line 964, in _reduce_benchmarks
    acc[bm_name] = self._do_bench(
                   ^^^^^^^^^^^^^^^
  File "/__w/helion/helion/benchmarks/tritonbench/tritonbench/utils/triton_op.py", line 1358, in _do_bench
    metrics.latency = do_bench_wrapper(
                      ^^^^^^^^^^^^^^^^^
  File "/__w/helion/helion/benchmarks/tritonbench/tritonbench/components/do_bench/run.py", line 413, in do_bench_wrapper
    raise e
  File "/__w/helion/helion/benchmarks/tritonbench/tritonbench/components/do_bench/run.py", line 403, in do_bench_wrapper
    times=bench_fn(
          ^^^^^^^^^
  File "/__w/helion/helion/benchmarks/tritonbench/tritonbench/components/do_bench/run.py", line 192, in _do_bench_profiler
    estimate_ms = benchmarker.benchmark_gpu(fn, estimation_iters=5, benchmark_iters=10)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/__w/helion/helion/.venv/lib/python3.12/site-packages/torch/_inductor/runtime/benchmarking.py", line 39, in wrapper
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/__w/helion/helion/.venv/lib/python3.12/site-packages/torch/_inductor/runtime/benchmarking.py", line 250, in benchmark_gpu
    _callable()
  File "/__w/helion/helion/examples/sum.py", line 64, in compute_sum
    result = sum_kernel(x_2d)
             ^^^^^^^^^^^^^^^^
  File "/__w/helion/helion/helion/runtime/kernel.py", line 285, in __call__
    return self.bind(args)(*args)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/__w/helion/helion/helion/runtime/kernel.py", line 608, in __call__
    return self._run(*args)
           ^^^^^^^^^^^^^^^^
  File "/tmp/torchinductor_root/62/c62kstawhc3f3u5h2njxdpek2fgehlo5thwlidup324nfw7zmbuj.py", line 31, in sum_kernel
    _launcher(_helion_sum_kernel, (1,), x, out, _REDUCTION_BLOCK_1, num_warps=16, num_stages=7)
  File "/__w/helion/helion/helion/runtime/__init__.py", line 63, in default_launcher
    return triton_kernel.run(
           ^^^^^^^^^^^^^^^^^^
  File "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/runtime/jit.py", line 699, in run
    kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, launch_metadata,
  File "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/driver.py", line 712, in __call__
    self.launch(gridX, gridY, gridZ, stream, function, self.launch_cooperative_grid, self.launch_pdl,
RuntimeError: Triton Error [CUDA]: misaligned address
```

It happens after autotuning is done (i.e. after `One can hardcode the best config ...` is printed). Likely the exception is leaking out somehow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark CI crashing due to `RuntimeError: Triton Error [CUDA]: misaligned address` after autotuning finds the best config #630

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark CI crashing due to RuntimeError: Triton Error [CUDA]: misaligned address after autotuning finds the best config #630

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Benchmark CI crashing due to `RuntimeError: Triton Error [CUDA]: misaligned address` after autotuning finds the best config #630