[RLlib] Issue #21671: Handle callbacks and model metrics for `TorchPolicy` while using multi-GPU optimizers #21697

XuehaiPan · 2022-01-19T13:22:24Z

Why are these changes needed?

In:

Lines 1314 to 1321 in 82103bf

    
           # Use simple optimizer (only for multi-agent or tf-eager; all other 
        
           # cases should use the multi-GPU optimizer, even if only using 1 GPU). 
        
           # TODO: (sven) rename MultiGPUOptimizer into something more 
        
           #  meaningful. 
        
           if self.config.get("simple_optimizer") is True: 
        
               train_results = train_one_step(self, train_batch) 
        
           else: 
        
               train_results = multi_gpu_train_one_step(self, train_batch)

ray/rllib/agents/trainer.py

Lines 1346 to 1356 in 82103bf

    
           if config.get("simple_optimizer") is True: 
        
               train_op = train_op.for_each(TrainOneStep(workers)) 
        
           else: 
        
               train_op = train_op.for_each( 
        
                   MultiGPUTrainOneStep( 
        
                       workers=workers, 
        
                       sgd_minibatch_size=config.get("sgd_minibatch_size", 
        
                                                     config["train_batch_size"]), 
        
                       num_sgd_iter=config.get("num_sgd_iter", 1), 
        
                       num_gpus=config["num_gpus"], 
        
                       _fake_gpus=config["_fake_gpus"]))

Unless config["simple_optimizer"] specified, the multi-GPU optimizer will be used when training RL policies on GPU (even if only 1 GPU). That means we will use multi_gpu_train_one_step rather than train_one_step when training with GPU(s).

In train_one_step, we will call policy.learn_on_batch, which will handle policy.callbacks.on_learn_on_batch() and policy.model.metrics().
In multi_gpu_train_one_step, we will call policy.learn_on_loaded_batch, but neither policy.callbacks.on_learn_on_batch() nor policy.model.metrics() will be called. This leads to issue [Bug] RLLib custom model metrics policy.model.metrics() are not logged when training with GPUs #21671.

Related issue number

#21671 (This PR only fixes TorchPolicy)

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>

gjoliver · 2022-01-19T17:18:28Z

yeah, this seems reasonable. we basically are not calling on_learn_on_batch() callbacks in learn_on_loaded_batch().
but I think @sven1977 should approve this, since he has the most context on this.
ping if you don't hear from Sven soon, we will make sure this gets taken cared of.

avnishn

This PR makes sense, but will need to take the reproduction script, and turn it into a test. I can help with this, as its non-trivial to reason about our CI. We could probably do this with the fake gpu towers, right @sven1977? Also need to extend this to tensorflow. Thanks for contributing this @XuehaiPan!

bveeramani · 2022-01-30T05:08:00Z

‼️ ACTION REQUIRED ‼️

We've switched our code formatter from YAPF to Black (see #21311).

To prevent issues with merging your code, here's what you'll need to do:

Install Black

pip install -I black==21.12b0

Format changed files with Black

curl -o format-changed.sh https://gist.githubusercontent.com/bveeramani/42ef0e9e387b755a8a735b084af976f2/raw/7631276790765d555c423b8db2b679fd957b984a/format-changed.sh
chmod +x ./format-changed.sh
./format-changed.sh
rm format-changed.sh

Commit your changes.

git add --all
git commit -m "Format Python code with Black"

Merge master into your branch.

git pull upstream master

Resolve merge conflicts (if necessary).

After running these steps, you'll have the updated format.sh.

Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>

avnishn · 2022-02-23T00:02:23Z

@sven1977 I have a question -- I was trying to see how to make this work for TF multi gpu experiments. I have little experience with TF, so I had some questions that I was hoping you could answer:

Whats the best way to go about calling self.callbacks.on_learn_on_batch here?

In the single node case, TF, we keep a reference to the sample batch that's going to be passed to self.callbacks.on_learn_on_batch when we load the batch into the Multi gpu tower.

In the multi device case, we store this in the Multi-GPU tower.

We need to be able to get data out of the Multi-GPU tower in the form of a sample batch, and only then will we be able to support this change for Tensorflow multi-gpu. I'm not sure how to go about that doing that.

Another approach here is to store the entire batch in a variable, like we do in the single device case, and then pass that batch to self.callbacks.on_learn_on_batch, however, I think that this would double our memory footprint (please correct me if I'm wrong), since we'd have to hold onto 2 copies of the batch (1 inside of the Multi-GPU tower, and again inside of the policy.)

sven1977

Hey @XuehaiPan , sorry for the delay and thanks for this great fix! I must have missed this PR.
Let's get this merged! :)

sven1977 · 2022-02-23T07:30:28Z

@avnishn , you are right, for tf, things work slightly differently unfortunately, as e.g. stats-fn is called on each individual tower batch (for torch, it's only called once with the entire batch, which is much cleaner).

I was hoping that our ongoing "ray train" integration will make these problems all go away. Let's get this merged here first for torch, then we can fix tf static graph (tf2 does currently NOT support multi-GPU!) later.

…or `TorchPolicy` while using multi-GPU optimizers (ray-project#21697)

Handle callbacks and model metrics while using multi-GPU optimizers

11255c5

Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>

XuehaiPan mentioned this pull request Jan 19, 2022

[Bug] RLLib custom model metrics policy.model.metrics() are not logged when training with GPUs #21671

Closed

2 tasks

Reformat files

0fda3b0

Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>

avnishn requested review from sven1977, avnishn and gjoliver January 19, 2022 15:11

Reformat files

b9471d5

Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>

avnishn reviewed Jan 19, 2022

View reviewed changes

XuehaiPan added 2 commits January 30, 2022 16:37

Merge branch 'master' into torch-model-metrics

0a01fab

Format Python code with Black

915699f

Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>

sven1977 approved these changes Feb 23, 2022

View reviewed changes

sven1977 merged commit 018ebbf into ray-project:master Feb 23, 2022

simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Feb 27, 2022

[RLlib] Issue ray-project#21671: Handle callbacks and model metrics f…

b8e6a3c

…or `TorchPolicy` while using multi-GPU optimizers (ray-project#21697)

XuehaiPan deleted the torch-model-metrics branch August 23, 2022 06:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Issue #21671: Handle callbacks and model metrics for `TorchPolicy` while using multi-GPU optimizers #21697

[RLlib] Issue #21671: Handle callbacks and model metrics for `TorchPolicy` while using multi-GPU optimizers #21697

XuehaiPan commented Jan 19, 2022 •

edited

Loading

gjoliver commented Jan 19, 2022

avnishn left a comment

bveeramani commented Jan 30, 2022

avnishn commented Feb 23, 2022

sven1977 left a comment

sven1977 commented Feb 23, 2022

	# Use simple optimizer (only for multi-agent or tf-eager; all other
	# cases should use the multi-GPU optimizer, even if only using 1 GPU).
	# TODO: (sven) rename MultiGPUOptimizer into something more
	# meaningful.
	if self.config.get("simple_optimizer") is True:
	train_results = train_one_step(self, train_batch)
	else:
	train_results = multi_gpu_train_one_step(self, train_batch)

	if config.get("simple_optimizer") is True:
	train_op = train_op.for_each(TrainOneStep(workers))
	else:
	train_op = train_op.for_each(
	MultiGPUTrainOneStep(
	workers=workers,
	sgd_minibatch_size=config.get("sgd_minibatch_size",
	config["train_batch_size"]),
	num_sgd_iter=config.get("num_sgd_iter", 1),
	num_gpus=config["num_gpus"],
	_fake_gpus=config["_fake_gpus"]))

[RLlib] Issue #21671: Handle callbacks and model metrics for TorchPolicy while using multi-GPU optimizers #21697

[RLlib] Issue #21671: Handle callbacks and model metrics for TorchPolicy while using multi-GPU optimizers #21697

Conversation

XuehaiPan commented Jan 19, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

gjoliver commented Jan 19, 2022

avnishn left a comment

Choose a reason for hiding this comment

bveeramani commented Jan 30, 2022

‼️ ACTION REQUIRED ‼️

avnishn commented Feb 23, 2022

sven1977 left a comment

Choose a reason for hiding this comment

sven1977 commented Feb 23, 2022

[RLlib] Issue #21671: Handle callbacks and model metrics for `TorchPolicy` while using multi-GPU optimizers #21697

[RLlib] Issue #21671: Handle callbacks and model metrics for `TorchPolicy` while using multi-GPU optimizers #21697

XuehaiPan commented Jan 19, 2022 •

edited

Loading