-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Issue #21671: Handle callbacks and model metrics for TorchPolicy
while using multi-GPU optimizers
#21697
Conversation
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
yeah, this seems reasonable. we basically are not calling on_learn_on_batch() callbacks in learn_on_loaded_batch(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR makes sense, but will need to take the reproduction script, and turn it into a test. I can help with this, as its non-trivial to reason about our CI. We could probably do this with the fake gpu towers, right @sven1977? Also need to extend this to tensorflow. Thanks for contributing this @XuehaiPan!
|
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
@sven1977 I have a question -- I was trying to see how to make this work for TF multi gpu experiments. I have little experience with TF, so I had some questions that I was hoping you could answer: Whats the best way to go about calling In the single node case, TF, we keep a reference to the sample batch that's going to be passed to In the multi device case, we store this in the Multi-GPU tower. We need to be able to get data out of the Multi-GPU tower in the form of a sample batch, and only then will we be able to support this change for Tensorflow multi-gpu. I'm not sure how to go about that doing that. Another approach here is to store the entire batch in a variable, like we do in the single device case, and then pass that batch to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @XuehaiPan , sorry for the delay and thanks for this great fix! I must have missed this PR.
Let's get this merged! :)
@avnishn , you are right, for tf, things work slightly differently unfortunately, as e.g. stats-fn is called on each individual tower batch (for torch, it's only called once with the entire batch, which is much cleaner). I was hoping that our ongoing "ray train" integration will make these problems all go away. Let's get this merged here first for torch, then we can fix tf static graph (tf2 does currently NOT support multi-GPU!) later. |
…or `TorchPolicy` while using multi-GPU optimizers (ray-project#21697)
Why are these changes needed?
In:
ray/rllib/agents/trainer.py
Lines 1314 to 1321 in 82103bf
ray/rllib/agents/trainer.py
Lines 1346 to 1356 in 82103bf
Unless
config["simple_optimizer"]
specified, the multi-GPU optimizer will be used when training RL policies on GPU (even if only 1 GPU). That means we will usemulti_gpu_train_one_step
rather thantrain_one_step
when training with GPU(s).In
train_one_step
, we will callpolicy.learn_on_batch
, which will handlepolicy.callbacks.on_learn_on_batch()
andpolicy.model.metrics()
.In
multi_gpu_train_one_step
, we will callpolicy.learn_on_loaded_batch
, but neitherpolicy.callbacks.on_learn_on_batch()
norpolicy.model.metrics()
will be called. This leads to issue [Bug] RLLib custom model metricspolicy.model.metrics()
are not logged when training with GPUs #21671.Related issue number
#21671 (This PR only fixes
TorchPolicy
)Checks
scripts/format.sh
to lint the changes in this PR.