[RLlib] Fix RNN learning for tf-eager/tf2.x. #11720

sven1977 · 2020-10-30T13:44:30Z

Learning a policy with an RNN model has not been supported so far (unknowingly) when using framework=[tf2|tfe].

This PR fixes this issue.
It also unifies TorchPolicy's compute_gradients and learn_on_batch methods (they should go through the same grad-computation functionality).

Why are these changes needed?

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

- fix A3C

sven1977 added 3 commits October 30, 2020 14:40

WIP.

5b26223

Fix.

a991869

LINT.

14cc799

sven1977 requested review from michaelzhiluo and ericl and removed request for ericl October 30, 2020 19:14

sven1977 assigned michaelzhiluo Oct 30, 2020

sven1977 changed the title ~~[WIP RLlib] Fix RNN learning for tf-eager/tf2.x.~~ [RLlib] Fix RNN learning for tf-eager/tf2.x. Oct 30, 2020

sven1977 added 3 commits October 31, 2020 13:26

WIP.

0405552

WIP.

4ddb792

Fix.

1e026fa

michaelzhiluo approved these changes Nov 1, 2020

View reviewed changes

sven1977 added 2 commits November 1, 2020 22:17

Fix.

8838179

- Add DDPPO regression test

2670e8b

- fix A3C

sven1977 merged commit 54d85a6 into ray-project:master Nov 2, 2020

sven1977 deleted the fix_torch_tf_eager_compute_grads_for_rnns branch June 2, 2023 20:12