Fix hl.rand to use tile specific offsets instead of fixed offsets, ensure unique random num per tile #685

karthickai · 2025-09-25T19:46:11Z

Stacked PRs:

->Fix hl.rand to use tile specific offsets instead of fixed offsets, ensure unique random num per tile #685

Fix hl.rand to use tile specific offsets instead of fixed offsets, ensure unique random num per tile

…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3

jansel · 2025-09-26T17:37:31Z

helion/language/random_ops.py

+    env = CompileEnvironment.current()
+    for size in fake_value.size():
+        block_id = env.get_block_id(size)
+        if block_id is not None:


What happens if block_id is None?

I've added validation to throw an error (use hl.rand inside hl.tile loops) when block_id is None

jansel · 2025-09-26T17:38:35Z

helion/language/random_ops.py

+                offs_expr = (
+                    f"({offset_var} + tl.arange(0, {numel})).reshape({shape_str})"
+                )
+            break


A break here seems incorrect, will we just ignore the rest of the shape?

Suppose you have [BLOCK0, BLOCK1] (a 2D random number)?

thanks again, I've updated multi dim cases to compute linear offsets using row major order, so each tile gets unique random numbers

1D: tl.rand(seed, indices_0.reshape([_BLOCK_SIZE_0]))
2D: tl.rand(seed, (offset_0 * _BLOCK_SIZE_1 + offset_1 + tl.arange(0, _BLOCK_SIZE_0 * _BLOCK_SIZE_1)).reshape([_BLOCK_SIZE_0, _BLOCK_SIZE_1]))
3D: tl.rand(seed, (offset_0 * (_BLOCK_SIZE_1 * _BLOCK_SIZE_2) + offset_1 * _BLOCK_SIZE_2 + offset_2 + tl.arange(0, _BLOCK_SIZE_0 * _BLOCK_SIZE_1 * _BLOCK_SIZE_2)).reshape([_BLOCK_SIZE_0, _BLOCK_SIZE_1, _BLOCK_SIZE_2]))

This doesn't look right. Changing block sizes will change RNG values.

…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3

jansel

Another simpler approach here would be to have the user provide offsets. Though that API isn't as nice.

jansel · 2025-09-27T15:17:33Z

test/test_rng.py

test_random.py

moved test cases to test_random.py

jansel · 2025-09-27T15:17:53Z

test/test_rng.py

+        self.assertTrue(torch.all(output >= 0.0), "All values should be >= 0")
+        self.assertTrue(torch.all(output < 1.0), "All values should be < 1")
+
+        self.assertIn("tl.rand(seed, indices_0.reshape([_BLOCK_SIZE_0]))", code3)


use assertExpectedJournal so I can see the full output code (throughout)

I've added the assertExpectedJournal in the testcases

jansel · 2025-09-27T15:18:39Z

helion/language/random_ops.py

+                        next_block_id = env.get_block_id(next_size)
+
+                        if next_block_id is not None:
+                            stride_components.append(f"_BLOCK_SIZE_{next_block_id}")


Query variable names through the API don't regenerate them

I've updated the code

jansel · 2025-09-27T15:19:39Z

helion/language/random_ops.py

+
+        for dim_idx, size in enumerate(tensor_shape):
+            block_id = env.get_block_id(size)
+            if block_id is not None:


What happens if block_id is None. You handle that case above, but not here.

updated the code to properly handle block_id is None scenarios

jansel · 2025-09-27T15:20:39Z

helion/language/random_ops.py

+                "Use hl.rand() inside hl.tile() loops with tile variables."
+            )
+
+        numel = " * ".join(shape_str.strip("[]").split(","))


Why strip []?

I used strip to parse shape str for numel (it's not a proper way to compute numel) In the updated code I've used proper APIs for tl.arange and AST expr

jansel · 2025-09-27T15:24:21Z

test/test_rng.py

+                "tl.rand(seed, ("
+                "offset_0 * _BLOCK_SIZE_1 + "
+                "offset_1 + "
+                "tl.arange(0, _BLOCK_SIZE_0 * _BLOCK_SIZE_1)"


This code still doesn't look correct to me. You fixed the issue that you were providing duplicate random values for each block, however we now have the problem that RNG is not deterministic. If you change the block size, then the random values change. I'd suggest adding some tests for:

Change the block sizes of a kernel and assert that the RNG stays the same

Sort the RNG output by value and assert you have roughly O(n) unique values

Test the case where non block sizes are passed in

Thanks @jansel, you're correct, the current implementation was sensitive to block size changes. I've updated the code to use global indices instead of block indices for deterministic RNG per element and have handled non-tiled input to hl.rand scenarios. I also added the mentioned test cases.

…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3

jansel · 2025-10-05T05:07:17Z

helion/language/random_ops.py

+    Args:
+        shape: A list of sizes
+        seed: int seed for the random number generator
+        dtype: currently only float32 supported


Remove this arg if only one value is supported.

thank you, I've removed the dtype arg.

jansel · 2025-10-05T05:11:23Z

helion/language/random_ops.py

+    The main propose of ``hl.rand`` is to explicitly pass a seed arg for deterministic
+    randomness in helion kernels, whereas ``torch.rand_like`` doesn't take seed arg
+    (though it can seeded globally)`. ``hl.rand`` lower to ``tl.rand(seed, offset)`` with ``offset``
+    built from a linear range over the allocation and reshaped to the given shape.
+
+    Note:
+        Only use within ``hl.tile()`` loops for creating local tensors.
+        For host allocations, use ``torch.rand()``.


Suggested change

The main propose of ``hl.rand`` is to explicitly pass a seed arg for deterministic

randomness in helion kernels, whereas ``torch.rand_like`` doesn't take seed arg

(though it can seeded globally)`. ``hl.rand`` lower to ``tl.rand(seed, offset)`` with ``offset``

built from a linear range over the allocation and reshaped to the given shape.

Note:

Only use within ``hl.tile()`` loops for creating local tensors.

For host allocations, use ``torch.rand()``.

hl.rand provides a Philox-based pseudorandom number generator (PRNG) that operates independently of PyTorch’s global random seed. Instead, it requires an explicit seed argument. Offsets are derived from the full logical sizes of the tiles specified in the shape argument.

I've updated the desc

jansel · 2025-10-05T05:13:09Z

helion/language/random_ops.py

+
+    Args:
+        shape: A list of sizes
+        seed: int seed for the random number generator


Suggested change

seed: int seed for the random number generator

seed: A single element int64 tensor or int literal

jansel · 2025-10-05T05:13:24Z

helion/language/random_ops.py

+@_decorators.api(tiles_as_sizes=True)
+def rand(
+    shape: list[object],
+    seed: int,


Suggested change

seed: int,

seed: int | torch.Tensor,

jansel · 2025-10-05T05:14:14Z

helion/language/random_ops.py

+                output = torch.zeros_like(x)
+                (m,) = x.shape
+                for (tile_m,) in hl.tile([m]):
+                    output[tile_m] = hl.rand([tile_m], seed=seed)


seed undefined?

thanks, I've changed it to static.

jansel · 2025-10-05T05:18:29Z

helion/language/random_ops.py

+        block_id = env.get_block_id(tensor_shape[i])
+        if block_id is not None:
+            rdim_name = f"_RDIM_SIZE_{block_id}"
+            if rdim_name in rdim_args:


This isn't the right way to detect an rdim, look at indexing_strategy.py for examples.

thanks, I replaced rdim detection with env.allocate_reduction_dimension This function handles both creating new rdim and reusing existing ones

jansel · 2025-10-05T05:19:00Z

helion/language/random_ops.py

+        if block_id is not None:
+            rdim_name = f"_RDIM_SIZE_{block_id}"
+            if rdim_name in rdim_args:
+                index_vars.append(f"tl.arange(0, {rdim_name})")


Add a test for rolled reductions, I don't think this will work in that case.

Added test_hl_rand_rolled_reductions() that tests identical kernel with reduction_loops=[None] vs [64]. After updating the logic with your feedback this test case is passed.

jansel · 2025-10-05T05:21:15Z

helion/language/random_ops.py

+            if symbol_idx < len(symbol_args):
+                size_names.append(symbol_args[symbol_idx])
+                symbol_idx += 1
+            else:
+                size_names.append(str(tensor_shape[i]))


This doesn't sound correct. You are ignoring the actual symbol associated with the block and just using the order they appear in the function args.

Add a test where you mix up the order.

Now I used block_info.size to get the actual tensor size symbol instead of relying on argument order. Added test_hl_rand_mixed_argument_order that tests kernels with different tile args orders but same hl.rand calls.

jansel · 2025-10-05T05:22:31Z

helion/language/random_ops.py

+        available_rdims = [name for name in rdim_args if name not in used_rdims]
+        if available_rdims:
+            rdim_name = available_rdims[0]
+            index_vars.append(f"tl.arange(0, {rdim_name})")
+            size_names.append(rdim_name)
+            used_rdims.add(rdim_name)


This doesn't look correct. You are ignoring the actual value the user passed and just assuming it matches the reduction args to the kernel.

thanks, I fixed with env.allocate_reduction_dimension now uses the actual user passed size value from the tensor shape

jansel · 2025-10-05T05:23:39Z

helion/language/random_ops.py

+        broadcast_slices = []
+        for i in range(ndim):
+            slice_parts = ["None"] * ndim
+            slice_parts[i] = ":"
+            broadcast_slices.append(f"[{', '.join(slice_parts)}]")


We should have a helper in indexing_strategy for broadcasting.

The existing get_broadcast_str(stack_shape, subscript_shape) takes two shapes and generates paired broadcast strings for stack-based operations. So I added get_element_broadcast_slice(dim_index, total_dims) method to StackIndexingStrategy. This generates individual broadcasting patterns like [:, None, None] for single dimensions within one tensor, used in our multi-dims stride calculations

…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3

jansel · 2025-10-07T20:38:14Z

helion/language/random_ops.py

+            broadcast_slice = StackIndexingStrategy.get_element_broadcast_slice(i, ndim)
+            broadcasted_index = f"{index_vars[i]}{broadcast_slice}"
+            if i < ndim - 1:
+                stride_expr = " * ".join(size_names[i + 1 :])


Suggested change

stride_expr = " * ".join(size_names[i + 1 :])

stride_expr = " * ".join(map("({})".format, size_names[i + 1 :]))

…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3

karthickai force-pushed the karthickai/stack/3 branch from 50c5645 to e311a10 Compare September 25, 2025 19:46

karthickai added a commit that referenced this pull request Sep 25, 2025

Fix hl.rand to use tile specific offsets instead of fixed offsets, en…

e311a10

…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 25, 2025

karthickai added a commit that referenced this pull request Sep 25, 2025

Fix hl.rand to use tile specific offsets instead of fixed offsets, en…

521dcc7

…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3

karthickai force-pushed the karthickai/stack/3 branch from e311a10 to 521dcc7 Compare September 25, 2025 19:49

karthickai requested a review from jansel September 25, 2025 19:51

karthickai mentioned this pull request Sep 25, 2025

[Benchmark] Add low mem dropout example #641

Merged

karthickai added a commit that referenced this pull request Sep 25, 2025

Fix hl.rand to use tile specific offsets instead of fixed offsets, en…

a876487

…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3

karthickai force-pushed the karthickai/stack/3 branch from 521dcc7 to a876487 Compare September 25, 2025 21:10

jansel requested changes Sep 26, 2025

View reviewed changes

karthickai added a commit that referenced this pull request Sep 26, 2025

Fix hl.rand to use tile specific offsets instead of fixed offsets, en…

ef0a1a6

…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3

karthickai force-pushed the karthickai/stack/3 branch from a876487 to ef0a1a6 Compare September 26, 2025 18:55

jansel requested changes Sep 27, 2025

View reviewed changes

karthickai added a commit that referenced this pull request Oct 1, 2025

Fix hl.rand to use tile specific offsets instead of fixed offsets, en…

29ceafb

…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3

karthickai force-pushed the karthickai/stack/3 branch from ef0a1a6 to 29ceafb Compare October 1, 2025 08:13

karthickai requested a review from jansel October 3, 2025 17:19

jansel requested changes Oct 5, 2025

View reviewed changes

karthickai added a commit that referenced this pull request Oct 5, 2025

Fix hl.rand to use tile specific offsets instead of fixed offsets, en…

c8772b7

…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3

karthickai force-pushed the karthickai/stack/3 branch from 29ceafb to c8772b7 Compare October 5, 2025 21:44

karthickai added a commit that referenced this pull request Oct 5, 2025

Fix hl.rand to use tile specific offsets instead of fixed offsets, en…

d4fdeae

…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3

karthickai force-pushed the karthickai/stack/3 branch from c8772b7 to d4fdeae Compare October 5, 2025 22:26

karthickai requested a review from jansel October 6, 2025 19:15

jansel approved these changes Oct 7, 2025

View reviewed changes

Fix hl.rand to use tile specific offsets instead of fixed offsets, en…

939a7c6

…sure unique random num per tile stack-info: PR: #685, branch: karthickai/stack/3

karthickai force-pushed the karthickai/stack/3 branch from d4fdeae to 939a7c6 Compare October 7, 2025 20:50

karthickai merged commit 9c9eea4 into main Oct 7, 2025
13 checks passed

	seed: int seed for the random number generator
	seed: A single element int64 tensor or int literal

	stride_expr = " * ".join(size_names[i + 1 :])
	stride_expr = " * ".join(map("({})".format, size_names[i + 1 :]))

Fix hl.rand to use tile specific offsets instead of fixed offsets, ensure unique random num per tile #685

Fix hl.rand to use tile specific offsets instead of fixed offsets, ensure unique random num per tile #685

Uh oh!

Conversation

karthickai commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!