Skip to content

[RLlib] Add IS APPO metrics to torch learner#63675

Merged
ArturNiederfahrenhorst merged 4 commits into
ray-project:masterfrom
ArturNiederfahrenhorst:appometrics
Jun 3, 2026
Merged

[RLlib] Add IS APPO metrics to torch learner#63675
ArturNiederfahrenhorst merged 4 commits into
ray-project:masterfrom
ArturNiederfahrenhorst:appometrics

Conversation

@ArturNiederfahrenhorst
Copy link
Copy Markdown
Contributor

Description

The old stack (for example the loss defined in our APPO TF policy) had importance sampling metrics to track off-policyness.
This PR introduces the same two metrics to the new stack.

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
@ArturNiederfahrenhorst ArturNiederfahrenhorst requested a review from a team as a code owner May 27, 2026 20:05
@ArturNiederfahrenhorst ArturNiederfahrenhorst added rllib RLlib related issues rllib-algorithms An RLlib algorithm/Trainer is not learning. labels May 27, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request ports the mean_IS and var_IS diagnostics for the importance sampling (IS) ratio from the old TensorFlow policy to the PyTorch APPO learner. The reviewer suggested using torch.square instead of torch.pow(..., 2.0) for better performance and idiomatic PyTorch code.

Comment thread rllib/algorithms/appo/torch/appo_torch_learner.py
ArturNiederfahrenhorst and others added 2 commits May 28, 2026 02:09
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
Copy link
Copy Markdown
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ArturNiederfahrenhorst ArturNiederfahrenhorst added the go add ONLY when ready to merge, run all tests label Jun 3, 2026
@ArturNiederfahrenhorst ArturNiederfahrenhorst merged commit 260908e into ray-project:master Jun 3, 2026
8 checks passed
rueian pushed a commit to rueian/ray that referenced this pull request Jun 4, 2026
## Description

The old stack (for example the loss defined in our APPO TF policy) had
importance sampling metrics to track off-policyness.
This PR introduces the same two metrics to the new stack.

---------

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-algorithms An RLlib algorithm/Trainer is not learning.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants