[RLlib] Add IS APPO metrics to torch learner by ArturNiederfahrenhorst · Pull Request #63675 · ray-project/ray

ArturNiederfahrenhorst · 2026-05-27T20:05:49Z

Description

The old stack (for example the loss defined in our APPO TF policy) had importance sampling metrics to track off-policyness.
This PR introduces the same two metrics to the new stack.

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

gemini-code-assist

Code Review

This pull request ports the mean_IS and var_IS diagnostics for the importance sampling (IS) ratio from the old TensorFlow policy to the PyTorch APPO learner. The reviewer suggested using torch.square instead of torch.pow(..., 2.0) for better performance and idiomatic PyTorch code.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

pseudo-rnd-thoughts

LGTM

## Description The old stack (for example the loss defined in our APPO TF policy) had importance sampling metrics to track off-policyness. This PR introduces the same two metrics to the new stack. --------- Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com> Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

add metrics

7412896

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

ArturNiederfahrenhorst requested a review from a team as a code owner May 27, 2026 20:05

ArturNiederfahrenhorst added rllib RLlib related issues rllib-algorithms An RLlib algorithm/Trainer is not learning. labels May 27, 2026

gemini-code-assist Bot reviewed May 27, 2026

View reviewed changes

Comment thread rllib/algorithms/appo/torch/appo_torch_learner.py

ArturNiederfahrenhorst and others added 2 commits May 28, 2026 02:09

Update rllib/algorithms/appo/torch/appo_torch_learner.py

b2f0f2c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>

polish

7faced0

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

pseudo-rnd-thoughts approved these changes Jun 1, 2026

View reviewed changes

Merge branch 'master' into appometrics

aa00d73

ArturNiederfahrenhorst added the go add ONLY when ready to merge, run all tests label Jun 3, 2026

ArturNiederfahrenhorst merged commit 260908e into ray-project:master Jun 3, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Add IS APPO metrics to torch learner#63675

[RLlib] Add IS APPO metrics to torch learner#63675
ArturNiederfahrenhorst merged 4 commits into
ray-project:masterfrom
ArturNiederfahrenhorst:appometrics

ArturNiederfahrenhorst commented May 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

pseudo-rnd-thoughts left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ArturNiederfahrenhorst commented May 27, 2026

Description

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants