Add GRPO loss example #1063

ighoshsubho · 2025-10-31T10:33:56Z

Overview

This PR adds a complete example implementation of Group Relative Policy Optimization (GRPO) loss using Helion kernels, including:

Helion kernels for forward and backward passes
PyTorch reference implementation for validation
Autograd integration via a custom torch.autograd.Function
Unit-style verification and a micro-benchmark harness

Benchmarks

=== Timing (median ms) ===
PyTorch Forward: 5.286 ms
PyTorch Backward: 14.828 ms
Helion Forward: 1.053 ms (x5.02 vs Torch)
Helion Backward: 2.461 ms (x6.03 vs Torch)

=== Throughput ===
PyTorch Fwd tokens/s: 3099305.7
PyTorch Bwd tokens/s: 1104966.6
Helion Fwd tokens/s: 15564766.3
Helion Bwd tokens/s: 6658797.2

Motivation

GRPO is used to optimize RLHF-style training by stabilizing policy updates through clipping with optional KL regularization against a reference model. This example demonstrates how to express a numerically stable, fused GRPO loss in Helion, and how to integrate Helion kernels into an end-to-end PyTorch workflow.

jansel

Thanks for the contribution! Can you add a test for this similar to the tests for the other examples?

ighoshsubho · 2025-11-02T07:33:42Z

Thanks for the contribution! Can you add a test for this similar to the tests for the other examples?

Yeah @jansel I added them just now with changes in both .expected file and the examples!

examples/grpo_loss.py

ighoshsubho · 2025-11-03T07:42:25Z

@oulgen I have removed the redundant configs for a specific hardware. Is it ok to merge?

jansel · 2025-11-04T04:37:38Z

Looks like many of the tests are failing with:
FAILED test/test_examples.py::TestExamples::test_grpo_loss_bwd - Failed: Timeout (>60.0s) from pytest-timeout.

Can we change the test to run faster? Perhaps smaller inputs?

ighoshsubho · 2025-11-04T06:03:26Z

I have made the tensor shape small enough to fit under 60 sec runtime compilation, should not throw any error for backward now.

cc: @jansel @oulgen

oulgen · 2025-11-04T06:05:32Z

I have made the tensor shape small enough to fit under 60 sec runtime compilation, should not throw any error for backward now.

cc: @jansel @oulgen

thanks, i'll merge if/when the tests pass

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 31, 2025

jansel requested changes Nov 2, 2025

View reviewed changes

ighoshsubho force-pushed the add-fused-grpo-loss-example branch from d3277a5 to 048ecf1 Compare November 2, 2025 07:29

oulgen reviewed Nov 3, 2025

View reviewed changes

examples/grpo_loss.py Outdated Show resolved Hide resolved

oulgen reviewed Nov 3, 2025

View reviewed changes

examples/grpo_loss.py Outdated Show resolved Hide resolved

oulgen approved these changes Nov 3, 2025

View reviewed changes

ighoshsubho added 3 commits November 3, 2025 11:28

Add GRPO loss example

ede6c4d

Added tests for grpo_loss example

d08ea56

Removing hardware specific settings on grpo loss kernel

97393e9

oulgen force-pushed the add-fused-grpo-loss-example branch from ec54cdb to 97393e9 Compare November 3, 2025 19:28

jansel approved these changes Nov 4, 2025

View reviewed changes

updated bwd with smaller tensor grpo samples for faster compilation

7b16062

oulgen merged commit 5ef76af into pytorch:main Nov 4, 2025
14 of 15 checks passed

ighoshsubho deleted the add-fused-grpo-loss-example branch November 4, 2025 10:32

tianrengao pushed a commit that referenced this pull request Nov 5, 2025

Add GRPO loss example (#1063)

5b6d191

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GRPO loss example #1063

Add GRPO loss example #1063

Uh oh!

ighoshsubho commented Oct 31, 2025

Uh oh!

jansel left a comment

Uh oh!

ighoshsubho commented Nov 2, 2025

Uh oh!

Uh oh!

Uh oh!

ighoshsubho commented Nov 3, 2025

Uh oh!

jansel commented Nov 4, 2025

Uh oh!

ighoshsubho commented Nov 4, 2025 •

edited

Loading

Uh oh!

oulgen commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add GRPO loss example #1063

Add GRPO loss example #1063

Uh oh!

Conversation

ighoshsubho commented Oct 31, 2025

Overview

Benchmarks

Motivation

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

ighoshsubho commented Nov 2, 2025

Uh oh!

Uh oh!

Uh oh!

ighoshsubho commented Nov 3, 2025

Uh oh!

jansel commented Nov 4, 2025

Uh oh!

ighoshsubho commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oulgen commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ighoshsubho commented Nov 4, 2025 •

edited

Loading