-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] A2C training_iteration
method implementation (_disable_execution_plan_api=True
)
#23735
Conversation
rllib/agents/a3c/a2c.py
Outdated
grad, info = self.workers.local_worker().compute_gradients( | ||
train_batch, single_agent=True | ||
) | ||
self._microbatches.append((grad, train_batch.count)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if microbatch_size is 8 and train_batch.count comes out at 32? Can this happen in some cases?
We have a similar while logic in a couple of training iteration fns.
In this case it would lead to a miscalculation of num_microbatches and obviously much larger microbatches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point. I'll add a config check for these calculations. ...
rllib/agents/a3c/a2c.py
Outdated
# Accumulate gradients. | ||
acc_gradients = None | ||
sum_count = 0 | ||
for grad, count in self._microbatches: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is's a great idea to compute the gradients right after collecting the sampling, like it's done in the original paper, but why don't we accumulate them on the go as well? This would save us some space and maybe a pinch of runtime (?) here and the original paper also looks like they accumulate on the run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. This should give us a tiny performance improvement. ... Will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only have some minor questions 👍
Hey @ArturNiederfahrenhorst , fixed the mentioned items. Please take another look. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Implemented A2C using the new
training_iteration
API.Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.