[BUG] Using both reduction and atomic operations along with autotune makes incorrect results #4217

Kitsunetic · 2024-06-26T22:31:32Z

Environment

Triton version: 2.3.1
PyTorch version: 2.3.0
CUDA version: 12.1

Issue Description

When using both reduction operations and atomic operations triton.autotune, the output is incorrect upon encountering a new input shape.

Reproduction Code

@triton.autotune(
    configs=[
        triton.Config({"BLOCK_SIZE": 256}),
        triton.Config({"BLOCK_SIZE": 128}),
        triton.Config({"BLOCK_SIZE": 64}),
        triton.Config({"BLOCK_SIZE": 32}),
    ],
    key=["N"],
)
@triton.jit
def vector_sum_kernel(x_ptr, y_ptr, N, BLOCK_SIZE: tl.constexpr):
    pid = tl.program_id(0)
    offs_n = pid * BLOCK_SIZE + tl.arange(0, BLOCK_SIZE)
    x = tl.load(x_ptr + offs_n, mask=offs_n < N, other=0.0)

    y = tl.sum(x)
    tl.atomic_add(y_ptr, y)


def vector_sum(x):
    assert x.ndim == 1

    N = x.size(0)
    y = x.new_zeros(1)
    grid = lambda meta: ((triton.cdiv(N, meta["BLOCK_SIZE"]),))
    vector_sum_kernel[grid](x, y, N)
    return y

# Test cases
x = th.rand(120, device="cuda")

print(vector_sum(x))  # tensor([79561.5703], device='cuda:6'), incorrect
print(vector_sum(x))  # tensor([62.5476], device='cuda:6'), correct
print(vector_sum(x))  # tensor([62.5476], device='cuda:6'), correct
print(vector_sum(x))  # tensor([62.5476], device='cuda:6'), correct

Conclusion

The first call to vector_sum(x) produces an incorrect result of tensor([79561.5703], device='cuda:6').
Subsequent calls to vector_sum(x) produce correct results.

The issue occurs with other reduction functions such as tl.max and atomic functions like tl.atomic_max.
However, using only one of them does not raise the issue.

The text was updated successfully, but these errors were encountered:

Jokeren · 2024-06-26T22:36:35Z

The conclusion might be wrong. I think it might be a problem in the user code instead of the autotuner. Try reset_to_zero=["y_ptr"].

Kitsunetic · 2024-06-26T22:41:23Z

Oh, I think I didn't look carefully the document. Thank you!

Kitsunetic closed this as completed Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Using both reduction and atomic operations along with autotune makes incorrect results #4217

[BUG] Using both reduction and atomic operations along with autotune makes incorrect results #4217

Kitsunetic commented Jun 26, 2024 •

edited

Loading

Jokeren commented Jun 26, 2024

Kitsunetic commented Jun 26, 2024

[BUG] Using both reduction and atomic operations along with autotune makes incorrect results #4217

[BUG] Using both reduction and atomic operations along with autotune makes incorrect results #4217

Comments

Kitsunetic commented Jun 26, 2024 • edited Loading

Environment

Issue Description

Reproduction Code

Conclusion

Jokeren commented Jun 26, 2024

Kitsunetic commented Jun 26, 2024

Kitsunetic commented Jun 26, 2024 •

edited

Loading