Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] A2C training_iteration method implementation (_disable_execution_plan_api=True) #23735

Merged
merged 13 commits into from
Apr 15, 2022

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Apr 6, 2022

Implemented A2C using the new training_iteration API.

  • Set _disable_execution_plan_api=True by default for this algo.
  • Reinstated microbatch learning test case for A2C oin CartPole.

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

rllib/BUILD Outdated Show resolved Hide resolved
grad, info = self.workers.local_worker().compute_gradients(
train_batch, single_agent=True
)
self._microbatches.append((grad, train_batch.count))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if microbatch_size is 8 and train_batch.count comes out at 32? Can this happen in some cases?
We have a similar while logic in a couple of training iteration fns.
In this case it would lead to a miscalculation of num_microbatches and obviously much larger microbatches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. I'll add a config check for these calculations. ...

# Accumulate gradients.
acc_gradients = None
sum_count = 0
for grad, count in self._microbatches:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is's a great idea to compute the gradients right after collecting the sampling, like it's done in the original paper, but why don't we accumulate them on the go as well? This would save us some space and maybe a pinch of runtime (?) here and the original paper also looks like they accumulate on the run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This should give us a tiny performance improvement. ... Will change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

@ArturNiederfahrenhorst ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only have some minor questions 👍

@sven1977 sven1977 requested a review from smorad as a code owner April 15, 2022 11:17
@sven1977
Copy link
Contributor Author

Hey @ArturNiederfahrenhorst , fixed the mentioned items. Please take another look. Thanks!

Copy link
Contributor

@ArturNiederfahrenhorst ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sven1977 sven1977 merged commit 92781c6 into ray-project:master Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants