[RLlib] Allow for more than 2^31 policy timesteps. #11301

sven1977 · 2020-10-09T07:53:30Z

RLlib currently crashes when [policy].global_timestep reaches 2^31 due to the respective tensor-types being of dtype=int32.
This PR fixes this issue.

Closes #10810

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

ericl

Nice test!

sven1977 added 2 commits October 9, 2020 09:49

WIP.

d3cb78c

WIP.

326f751

sven1977 requested a review from ericl October 9, 2020 07:53

sven1977 assigned ericl Oct 9, 2020

sven1977 mentioned this pull request Oct 9, 2020

[rllib] How to train beyond 2^31 timesteps? #10810

Closed

sven1977 added 6 commits October 9, 2020 14:32

Fixes.

6536803

Fixes.

6ddcd3f

Fixes and LINT.

49312b7

Fixes and LINT.

eccc12c

Fixes and LINT.

ff47459

Fixes.

426a234

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Oct 12, 2020

ericl approved these changes Oct 12, 2020

View reviewed changes

ericl merged commit 8ea1bc5 into ray-project:master Oct 12, 2020

sven1977 deleted the issue_10810_go_beoynd_2pow31_timesteps branch January 18, 2021 13:04

Provide feedback