[RLlib] A2C `training_iteration` method implementation (`_disable_execution_plan_api=True`) #23735

sven1977 · 2022-04-06T07:14:26Z

Implemented A2C using the new training_iteration API.

Set _disable_execution_plan_api=True by default for this algo.
Reinstated microbatch learning test case for A2C oin CartPole.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…training_itr

rllib/BUILD

ArturNiederfahrenhorst · 2022-04-06T12:13:30Z

rllib/agents/a3c/a2c.py

+            grad, info = self.workers.local_worker().compute_gradients(
+                train_batch, single_agent=True
+            )
+            self._microbatches.append((grad, train_batch.count))


What happens if microbatch_size is 8 and train_batch.count comes out at 32? Can this happen in some cases?
We have a similar while logic in a couple of training iteration fns.
In this case it would lead to a miscalculation of num_microbatches and obviously much larger microbatches?

Yes, good point. I'll add a config check for these calculations. ...

rllib/agents/a3c/a2c.py

ArturNiederfahrenhorst · 2022-04-06T12:34:41Z

rllib/agents/a3c/a2c.py

+            # Accumulate gradients.
+            acc_gradients = None
+            sum_count = 0
+            for grad, count in self._microbatches:


I think is's a great idea to compute the gradients right after collecting the sampling, like it's done in the original paper, but why don't we accumulate them on the go as well? This would save us some space and maybe a pinch of runtime (?) here and the original paper also looks like they accumulate on the run.

You are right. This should give us a tiny performance improvement. ... Will change.

ArturNiederfahrenhorst

I only have some minor questions 👍

…training_itr

sven1977 · 2022-04-15T11:51:50Z

Hey @ArturNiederfahrenhorst , fixed the mentioned items. Please take another look. Thanks!

ArturNiederfahrenhorst

lgtm

sven1977 added 3 commits April 6, 2022 09:05

wip

02f358e

Merge branch 'master' of https://github.com/ray-project/ray into a2c_…

2a94588

…training_itr

wip

be3763d

sven1977 requested review from gjoliver and avnishn as code owners April 6, 2022 07:14

sven1977 assigned gjoliver and ArturNiederfahrenhorst Apr 6, 2022

sven1977 added 3 commits April 6, 2022 09:27

wip

1cbf00d

fix

f617053

LINT

212beea

ArturNiederfahrenhorst reviewed Apr 6, 2022

View reviewed changes

rllib/BUILD Outdated Show resolved Hide resolved

sven1977 added 2 commits April 6, 2022 14:01

type

ee3e4a1

LINT

a2ef91b

ArturNiederfahrenhorst reviewed Apr 6, 2022

View reviewed changes

rllib/agents/a3c/a2c.py Outdated Show resolved Hide resolved

ArturNiederfahrenhorst reviewed Apr 6, 2022

View reviewed changes

sven1977 added 2 commits April 15, 2022 12:58

Merge branch 'master' of https://github.com/ray-project/ray into a2c_…

7ff9bf5

…training_itr

wip

5ff6297

sven1977 requested a review from smorad as a code owner April 15, 2022 11:17

wip

d8a8b21

Update a2c.py

18ec0f7

ArturNiederfahrenhorst approved these changes Apr 15, 2022

View reviewed changes

wip

7fc60ee

sven1977 merged commit 92781c6 into ray-project:master Apr 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] A2C `training_iteration` method implementation (`_disable_execution_plan_api=True`) #23735

[RLlib] A2C `training_iteration` method implementation (`_disable_execution_plan_api=True`) #23735

sven1977 commented Apr 6, 2022

ArturNiederfahrenhorst Apr 6, 2022 •

edited

sven1977 Apr 15, 2022

ArturNiederfahrenhorst Apr 6, 2022

sven1977 Apr 15, 2022

sven1977 Apr 15, 2022

ArturNiederfahrenhorst left a comment

sven1977 commented Apr 15, 2022

ArturNiederfahrenhorst left a comment

[RLlib] A2C training_iteration method implementation (_disable_execution_plan_api=True) #23735

[RLlib] A2C training_iteration method implementation (_disable_execution_plan_api=True) #23735

Conversation

sven1977 commented Apr 6, 2022

Why are these changes needed?

Related issue number

Checks

ArturNiederfahrenhorst Apr 6, 2022 • edited

Choose a reason for hiding this comment

sven1977 Apr 15, 2022

Choose a reason for hiding this comment

ArturNiederfahrenhorst Apr 6, 2022

Choose a reason for hiding this comment

sven1977 Apr 15, 2022

Choose a reason for hiding this comment

sven1977 Apr 15, 2022

Choose a reason for hiding this comment

ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

sven1977 commented Apr 15, 2022

ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

[RLlib] A2C `training_iteration` method implementation (`_disable_execution_plan_api=True`) #23735

[RLlib] A2C `training_iteration` method implementation (`_disable_execution_plan_api=True`) #23735

ArturNiederfahrenhorst Apr 6, 2022 •

edited