Skip to content

Conversation

karthickai
Copy link
Contributor

@karthickai karthickai commented Sep 25, 2025

Stacked PRs:


Fix hl.rand to use tile specific offsets instead of fixed offsets, ensure unique random num per tile

karthickai added a commit that referenced this pull request Sep 25, 2025
…sure unique random num per tile

stack-info: PR: #685, branch: karthickai/stack/3
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 25, 2025
karthickai added a commit that referenced this pull request Sep 25, 2025
…sure unique random num per tile

stack-info: PR: #685, branch: karthickai/stack/3
@karthickai karthickai requested a review from jansel September 25, 2025 19:51
karthickai added a commit that referenced this pull request Sep 25, 2025
…sure unique random num per tile

stack-info: PR: #685, branch: karthickai/stack/3
env = CompileEnvironment.current()
for size in fake_value.size():
block_id = env.get_block_id(size)
if block_id is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if block_id is None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added validation to throw an error (use hl.rand inside hl.tile loops) when block_id is None

offs_expr = (
f"({offset_var} + tl.arange(0, {numel})).reshape({shape_str})"
)
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A break here seems incorrect, will we just ignore the rest of the shape?

Suppose you have [BLOCK0, BLOCK1] (a 2D random number)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks again, I've updated multi dim cases to compute linear offsets using row major order, so each tile gets unique random numbers

1D: tl.rand(seed, indices_0.reshape([_BLOCK_SIZE_0]))
2D: tl.rand(seed, (offset_0 * _BLOCK_SIZE_1 + offset_1 + tl.arange(0, _BLOCK_SIZE_0 * _BLOCK_SIZE_1)).reshape([_BLOCK_SIZE_0, _BLOCK_SIZE_1]))
3D: tl.rand(seed, (offset_0 * (_BLOCK_SIZE_1 * _BLOCK_SIZE_2) + offset_1 * _BLOCK_SIZE_2 + offset_2 + tl.arange(0, _BLOCK_SIZE_0 * _BLOCK_SIZE_1 * _BLOCK_SIZE_2)).reshape([_BLOCK_SIZE_0, _BLOCK_SIZE_1, _BLOCK_SIZE_2]))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right. Changing block sizes will change RNG values.

karthickai added a commit that referenced this pull request Sep 26, 2025
…sure unique random num per tile

stack-info: PR: #685, branch: karthickai/stack/3
Copy link
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another simpler approach here would be to have the user provide offsets. Though that API isn't as nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_random.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved test cases to test_random.py

test/test_rng.py Outdated
self.assertTrue(torch.all(output >= 0.0), "All values should be >= 0")
self.assertTrue(torch.all(output < 1.0), "All values should be < 1")

self.assertIn("tl.rand(seed, indices_0.reshape([_BLOCK_SIZE_0]))", code3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use assertExpectedJournal so I can see the full output code (throughout)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the assertExpectedJournal in the testcases

next_block_id = env.get_block_id(next_size)

if next_block_id is not None:
stride_components.append(f"_BLOCK_SIZE_{next_block_id}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query variable names through the API don't regenerate them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code


for dim_idx, size in enumerate(tensor_shape):
block_id = env.get_block_id(size)
if block_id is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if block_id is None. You handle that case above, but not here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the code to properly handle block_id is None scenarios

"Use hl.rand() inside hl.tile() loops with tile variables."
)

numel = " * ".join(shape_str.strip("[]").split(","))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why strip []?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used strip to parse shape str for numel (it's not a proper way to compute numel) In the updated code I've used proper APIs for tl.arange and AST expr

test/test_rng.py Outdated
"tl.rand(seed, ("
"offset_0 * _BLOCK_SIZE_1 + "
"offset_1 + "
"tl.arange(0, _BLOCK_SIZE_0 * _BLOCK_SIZE_1)"
Copy link
Contributor

@jansel jansel Sep 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code still doesn't look correct to me. You fixed the issue that you were providing duplicate random values for each block, however we now have the problem that RNG is not deterministic. If you change the block size, then the random values change. I'd suggest adding some tests for:

  1. Change the block sizes of a kernel and assert that the RNG stays the same
  2. Sort the RNG output by value and assert you have roughly O(n) unique values
  3. Test the case where non block sizes are passed in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jansel, you're correct, the current implementation was sensitive to block size changes. I've updated the code to use global indices instead of block indices for deterministic RNG per element and have handled non-tiled input to hl.rand scenarios. I also added the mentioned test cases.

karthickai added a commit that referenced this pull request Oct 1, 2025
…sure unique random num per tile

stack-info: PR: #685, branch: karthickai/stack/3
@karthickai karthickai requested a review from jansel October 3, 2025 17:19
Args:
shape: A list of sizes
seed: int seed for the random number generator
dtype: currently only float32 supported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this arg if only one value is supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, I've removed the dtype arg.

Comment on lines 30 to 37
The main propose of ``hl.rand`` is to explicitly pass a seed arg for deterministic
randomness in helion kernels, whereas ``torch.rand_like`` doesn't take seed arg
(though it can seeded globally)`. ``hl.rand`` lower to ``tl.rand(seed, offset)`` with ``offset``
built from a linear range over the allocation and reshaped to the given shape.
Note:
Only use within ``hl.tile()`` loops for creating local tensors.
For host allocations, use ``torch.rand()``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The main propose of ``hl.rand`` is to explicitly pass a seed arg for deterministic
randomness in helion kernels, whereas ``torch.rand_like`` doesn't take seed arg
(though it can seeded globally)`. ``hl.rand`` lower to ``tl.rand(seed, offset)`` with ``offset``
built from a linear range over the allocation and reshaped to the given shape.
Note:
Only use within ``hl.tile()`` loops for creating local tensors.
For host allocations, use ``torch.rand()``.
hl.rand provides a Philox-based pseudorandom number generator (PRNG) that operates independently of PyTorchs global random seed. Instead, it requires an explicit seed argument. Offsets are derived from the full logical sizes of the tiles specified in the shape argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the desc

Args:
shape: A list of sizes
seed: int seed for the random number generator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
seed: int seed for the random number generator
seed: A single element int64 tensor or int literal

@_decorators.api(tiles_as_sizes=True)
def rand(
shape: list[object],
seed: int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
seed: int,
seed: int | torch.Tensor,

output = torch.zeros_like(x)
(m,) = x.shape
for (tile_m,) in hl.tile([m]):
output[tile_m] = hl.rand([tile_m], seed=seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seed undefined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I've changed it to static.

block_id = env.get_block_id(tensor_shape[i])
if block_id is not None:
rdim_name = f"_RDIM_SIZE_{block_id}"
if rdim_name in rdim_args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the right way to detect an rdim, look at indexing_strategy.py for examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I replaced rdim detection with env.allocate_reduction_dimension This function handles both creating new rdim and reusing existing ones

if block_id is not None:
rdim_name = f"_RDIM_SIZE_{block_id}"
if rdim_name in rdim_args:
index_vars.append(f"tl.arange(0, {rdim_name})")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test for rolled reductions, I don't think this will work in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test_hl_rand_rolled_reductions() that tests identical kernel with reduction_loops=[None] vs [64]. After updating the logic with your feedback this test case is passed.

Comment on lines 121 to 125
if symbol_idx < len(symbol_args):
size_names.append(symbol_args[symbol_idx])
symbol_idx += 1
else:
size_names.append(str(tensor_shape[i]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound correct. You are ignoring the actual symbol associated with the block and just using the order they appear in the function args.

Add a test where you mix up the order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I used block_info.size to get the actual tensor size symbol instead of relying on argument order. Added test_hl_rand_mixed_argument_order that tests kernels with different tile args orders but same hl.rand calls.

Comment on lines 128 to 133
available_rdims = [name for name in rdim_args if name not in used_rdims]
if available_rdims:
rdim_name = available_rdims[0]
index_vars.append(f"tl.arange(0, {rdim_name})")
size_names.append(rdim_name)
used_rdims.add(rdim_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look correct. You are ignoring the actual value the user passed and just assuming it matches the reduction args to the kernel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I fixed with env.allocate_reduction_dimension now uses the actual user passed size value from the tensor shape

Comment on lines 143 to 147
broadcast_slices = []
for i in range(ndim):
slice_parts = ["None"] * ndim
slice_parts[i] = ":"
broadcast_slices.append(f"[{', '.join(slice_parts)}]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a helper in indexing_strategy for broadcasting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing get_broadcast_str(stack_shape, subscript_shape) takes two shapes and generates paired broadcast strings for stack-based operations. So I added get_element_broadcast_slice(dim_index, total_dims) method to StackIndexingStrategy. This generates individual broadcasting patterns like [:, None, None] for single dimensions within one tensor, used in our multi-dims stride calculations

karthickai added a commit that referenced this pull request Oct 5, 2025
…sure unique random num per tile

stack-info: PR: #685, branch: karthickai/stack/3
karthickai added a commit that referenced this pull request Oct 5, 2025
…sure unique random num per tile

stack-info: PR: #685, branch: karthickai/stack/3
@karthickai karthickai requested a review from jansel October 6, 2025 19:15
broadcast_slice = StackIndexingStrategy.get_element_broadcast_slice(i, ndim)
broadcasted_index = f"{index_vars[i]}{broadcast_slice}"
if i < ndim - 1:
stride_expr = " * ".join(size_names[i + 1 :])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
stride_expr = " * ".join(size_names[i + 1 :])
stride_expr = " * ".join(map("({})".format, size_names[i + 1 :]))

…sure unique random num per tile

stack-info: PR: #685, branch: karthickai/stack/3
@karthickai karthickai merged commit 9c9eea4 into main Oct 7, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants