-
Notifications
You must be signed in to change notification settings - Fork 37
Fix hl.rand to use tile specific offsets instead of fixed offsets, ensure unique random num per tile #685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
50c5645
to
e311a10
Compare
…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3
…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3
e311a10
to
521dcc7
Compare
…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3
521dcc7
to
a876487
Compare
env = CompileEnvironment.current() | ||
for size in fake_value.size(): | ||
block_id = env.get_block_id(size) | ||
if block_id is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if block_id is None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added validation to throw an error (use hl.rand
inside hl.tile
loops) when block_id
is None
helion/language/random_ops.py
Outdated
offs_expr = ( | ||
f"({offset_var} + tl.arange(0, {numel})).reshape({shape_str})" | ||
) | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A break here seems incorrect, will we just ignore the rest of the shape?
Suppose you have [BLOCK0, BLOCK1]
(a 2D random number)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks again, I've updated multi dim cases to compute linear offsets using row major order, so each tile gets unique random numbers
1D: tl.rand(seed, indices_0.reshape([_BLOCK_SIZE_0]))
2D: tl.rand(seed, (offset_0 * _BLOCK_SIZE_1 + offset_1 + tl.arange(0, _BLOCK_SIZE_0 * _BLOCK_SIZE_1)).reshape([_BLOCK_SIZE_0, _BLOCK_SIZE_1]))
3D: tl.rand(seed, (offset_0 * (_BLOCK_SIZE_1 * _BLOCK_SIZE_2) + offset_1 * _BLOCK_SIZE_2 + offset_2 + tl.arange(0, _BLOCK_SIZE_0 * _BLOCK_SIZE_1 * _BLOCK_SIZE_2)).reshape([_BLOCK_SIZE_0, _BLOCK_SIZE_1, _BLOCK_SIZE_2]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right. Changing block sizes will change RNG values.
…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3
a876487
to
ef0a1a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another simpler approach here would be to have the user provide offsets. Though that API isn't as nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_random.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved test cases to test_random.py
test/test_rng.py
Outdated
self.assertTrue(torch.all(output >= 0.0), "All values should be >= 0") | ||
self.assertTrue(torch.all(output < 1.0), "All values should be < 1") | ||
|
||
self.assertIn("tl.rand(seed, indices_0.reshape([_BLOCK_SIZE_0]))", code3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use assertExpectedJournal so I can see the full output code (throughout)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the assertExpectedJournal
in the testcases
helion/language/random_ops.py
Outdated
next_block_id = env.get_block_id(next_size) | ||
|
||
if next_block_id is not None: | ||
stride_components.append(f"_BLOCK_SIZE_{next_block_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query variable names through the API don't regenerate them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code
helion/language/random_ops.py
Outdated
|
||
for dim_idx, size in enumerate(tensor_shape): | ||
block_id = env.get_block_id(size) | ||
if block_id is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if block_id is None. You handle that case above, but not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the code to properly handle block_id
is None scenarios
helion/language/random_ops.py
Outdated
"Use hl.rand() inside hl.tile() loops with tile variables." | ||
) | ||
|
||
numel = " * ".join(shape_str.strip("[]").split(",")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why strip []?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used strip
to parse shape str for numel
(it's not a proper way to compute numel
) In the updated code I've used proper APIs for tl.arange
and AST expr
test/test_rng.py
Outdated
"tl.rand(seed, (" | ||
"offset_0 * _BLOCK_SIZE_1 + " | ||
"offset_1 + " | ||
"tl.arange(0, _BLOCK_SIZE_0 * _BLOCK_SIZE_1)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code still doesn't look correct to me. You fixed the issue that you were providing duplicate random values for each block, however we now have the problem that RNG is not deterministic. If you change the block size, then the random values change. I'd suggest adding some tests for:
- Change the block sizes of a kernel and assert that the RNG stays the same
- Sort the RNG output by value and assert you have roughly O(n) unique values
- Test the case where non block sizes are passed in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jansel, you're correct, the current implementation was sensitive to block size changes. I've updated the code to use global indices instead of block indices for deterministic RNG per element and have handled non-tiled input to hl.rand
scenarios. I also added the mentioned test cases.
…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3
ef0a1a6
to
29ceafb
Compare
helion/language/random_ops.py
Outdated
Args: | ||
shape: A list of sizes | ||
seed: int seed for the random number generator | ||
dtype: currently only float32 supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this arg if only one value is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, I've removed the dtype
arg.
helion/language/random_ops.py
Outdated
The main propose of ``hl.rand`` is to explicitly pass a seed arg for deterministic | ||
randomness in helion kernels, whereas ``torch.rand_like`` doesn't take seed arg | ||
(though it can seeded globally)`. ``hl.rand`` lower to ``tl.rand(seed, offset)`` with ``offset`` | ||
built from a linear range over the allocation and reshaped to the given shape. | ||
Note: | ||
Only use within ``hl.tile()`` loops for creating local tensors. | ||
For host allocations, use ``torch.rand()``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main propose of ``hl.rand`` is to explicitly pass a seed arg for deterministic | |
randomness in helion kernels, whereas ``torch.rand_like`` doesn't take seed arg | |
(though it can seeded globally)`. ``hl.rand`` lower to ``tl.rand(seed, offset)`` with ``offset`` | |
built from a linear range over the allocation and reshaped to the given shape. | |
Note: | |
Only use within ``hl.tile()`` loops for creating local tensors. | |
For host allocations, use ``torch.rand()``. | |
hl.rand provides a Philox-based pseudorandom number generator (PRNG) that operates independently of PyTorch’s global random seed. Instead, it requires an explicit seed argument. Offsets are derived from the full logical sizes of the tiles specified in the shape argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the desc
helion/language/random_ops.py
Outdated
Args: | ||
shape: A list of sizes | ||
seed: int seed for the random number generator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seed: int seed for the random number generator | |
seed: A single element int64 tensor or int literal |
helion/language/random_ops.py
Outdated
@_decorators.api(tiles_as_sizes=True) | ||
def rand( | ||
shape: list[object], | ||
seed: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seed: int, | |
seed: int | torch.Tensor, |
helion/language/random_ops.py
Outdated
output = torch.zeros_like(x) | ||
(m,) = x.shape | ||
for (tile_m,) in hl.tile([m]): | ||
output[tile_m] = hl.rand([tile_m], seed=seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seed undefined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I've changed it to static.
helion/language/random_ops.py
Outdated
block_id = env.get_block_id(tensor_shape[i]) | ||
if block_id is not None: | ||
rdim_name = f"_RDIM_SIZE_{block_id}" | ||
if rdim_name in rdim_args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the right way to detect an rdim, look at indexing_strategy.py for examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I replaced rdim detection with env.allocate_reduction_dimension
This function handles both creating new rdim and reusing existing ones
helion/language/random_ops.py
Outdated
if block_id is not None: | ||
rdim_name = f"_RDIM_SIZE_{block_id}" | ||
if rdim_name in rdim_args: | ||
index_vars.append(f"tl.arange(0, {rdim_name})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test for rolled reductions, I don't think this will work in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test_hl_rand_rolled_reductions()
that tests identical kernel with reduction_loops=[None]
vs [64]
. After updating the logic with your feedback this test case is passed.
helion/language/random_ops.py
Outdated
if symbol_idx < len(symbol_args): | ||
size_names.append(symbol_args[symbol_idx]) | ||
symbol_idx += 1 | ||
else: | ||
size_names.append(str(tensor_shape[i])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't sound correct. You are ignoring the actual symbol associated with the block and just using the order they appear in the function args.
Add a test where you mix up the order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I used block_info.size
to get the actual tensor size symbol instead of relying on argument order. Added test_hl_rand_mixed_argument_order
that tests kernels with different tile args orders but same hl.rand
calls.
helion/language/random_ops.py
Outdated
available_rdims = [name for name in rdim_args if name not in used_rdims] | ||
if available_rdims: | ||
rdim_name = available_rdims[0] | ||
index_vars.append(f"tl.arange(0, {rdim_name})") | ||
size_names.append(rdim_name) | ||
used_rdims.add(rdim_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look correct. You are ignoring the actual value the user passed and just assuming it matches the reduction args to the kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I fixed with env.allocate_reduction_dimension
now uses the actual user passed size value from the tensor shape
helion/language/random_ops.py
Outdated
broadcast_slices = [] | ||
for i in range(ndim): | ||
slice_parts = ["None"] * ndim | ||
slice_parts[i] = ":" | ||
broadcast_slices.append(f"[{', '.join(slice_parts)}]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a helper in indexing_strategy for broadcasting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing get_broadcast_str(stack_shape, subscript_shape)
takes two shapes and generates paired broadcast strings for stack-based operations. So I added get_element_broadcast_slice(dim_index, total_dims)
method to StackIndexingStrategy
. This generates individual broadcasting patterns like [:, None, None]
for single dimensions within one tensor, used in our multi-dims stride calculations
…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3
29ceafb
to
c8772b7
Compare
…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3
c8772b7
to
d4fdeae
Compare
helion/language/random_ops.py
Outdated
broadcast_slice = StackIndexingStrategy.get_element_broadcast_slice(i, ndim) | ||
broadcasted_index = f"{index_vars[i]}{broadcast_slice}" | ||
if i < ndim - 1: | ||
stride_expr = " * ".join(size_names[i + 1 :]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stride_expr = " * ".join(size_names[i + 1 :]) | |
stride_expr = " * ".join(map("({})".format, size_names[i + 1 :])) |
…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3
d4fdeae
to
939a7c6
Compare
Stacked PRs:
Fix hl.rand to use tile specific offsets instead of fixed offsets, ensure unique random num per tile