[RLlib] Fix MB MPO #39654

ArturNiederfahrenhorst · 2023-09-14T00:25:02Z

Why are these changes needed?

#38321 raises the issue that the MB MPO example does not work.
This PR fixes multiple issues within MBMPO that have gone undedected in the past:

When initializing the loss, we use a batch size of 32, which is incompatible with the MAML loss function.
Since gymnasium has changed with time, the bare Pendulum and CartPole envs that we are wrapping don't have any logic that truncates or terminates by default. Only when you create them with gym.make(...), wrappers are added accordingly. Therefore, our old technique of wrapping causes MBMPO to collect endlessly initially.
Later, when MBMPO uses a dynamics model to predict future observations, the dynamics model also lacks a truncation or termination mechanism. This PR also adds this to the envs. I'm not sure how this was done in the past.

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>

sven1977

Nice fix! Thanks @ArturNiederfahrenhorst for digging into this.

Signed-off-by: Victor <vctr.y.m@example.com>

MrSVF · 2023-10-18T06:26:09Z

Dear @ArturNiederfahrenhorst, thank you for your pull request! The mbmpo algorithm runs correctly now, but unfortunately it does not train properly. Could you analyze this issue #40400 and give your recommendation, why this is happening and how to correct this behavior?

Fix mbmpo

d4c05aa

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>

ArturNiederfahrenhorst requested review from sven1977, gjoliver, avnishn, smorad, maxpumperla, kouroshHakha and krfricke as code owners September 14, 2023 00:25

remove learning test again because it takes too long

69fec80

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>

sven1977 approved these changes Sep 15, 2023

View reviewed changes

sven1977 merged commit 586f1b5 into ray-project:master Sep 15, 2023
38 of 41 checks passed

MrSVF mentioned this pull request Sep 15, 2023

[rllib][MBMPO] "The actor died unexpectedly before finishing this task" on simplest example MBMPO algorithm. #39660

Closed

vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023

[RLlib] Fix MB-MPO bug. (ray-project#39654)

c01ee02

Signed-off-by: Victor <vctr.y.m@example.com>

MrSVF mentioned this pull request Oct 17, 2023

[RLlib][MBMPO] The algorithm does not learn as intended. #40400

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Fix MB MPO #39654

[RLlib] Fix MB MPO #39654

ArturNiederfahrenhorst commented Sep 14, 2023

sven1977 left a comment

MrSVF commented Oct 18, 2023

[RLlib] Fix MB MPO #39654

[RLlib] Fix MB MPO #39654

Conversation

ArturNiederfahrenhorst commented Sep 14, 2023

Why are these changes needed?

sven1977 left a comment

Choose a reason for hiding this comment

MrSVF commented Oct 18, 2023