[Feature] Add QMix, VDN and IQL support to DQN trainer#3694
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3694
Note: Links to docs will display an error until the docs builds have been completed.
|
| self.log_rewards = log_rewards | ||
| self.log_observations = log_observations | ||
|
|
||
| if self.mixing_strategy in ("qmix", "vdn"): |
There was a problem hiding this comment.
if self.mixing_strategy in ("qmix", "vdn"): # "iql" missing
self.register_op("batch_process", self._aggregate_agent_rewards)
There was a problem hiding this comment.
no mixing for iql
| source_key = ("next", "agents", key[-1] if isinstance(key, tuple) else key) | ||
| value = batch.get(source_key, None) | ||
| if value is not None: | ||
| batch.set(next_key, value.mean(-2)) |
There was a problem hiding this comment.
batch.set(next_key, value.mean(-2))
There was a problem hiding this comment.
-2 is the agent dim
| spec = spec.clone() | ||
| if "action" not in spec.keys(): | ||
| spec["action"] = None | ||
| if action_key not in spec.keys(True, True): |
There was a problem hiding this comment.
spec logic is inverted and nested key wrapping is wrong.
The else branch fires when the key already exists and replaces the whole spec with a new Composite wrapping the old one, backwards. Also Composite({("agents", "action"): spec}) won't create proper nesting for tuple keys.
There was a problem hiding this comment.
Didn't get you sorry
There was a problem hiding this comment.
Like if the value keys aren't passed explicitly, they default to flat "action_value" / "chosen_action_value" instead of ("agents", "action_value") / ("agents", "chosen_action_value"). This breaks QMixerLoss. The value key defaults should mirror action_key's namespace.
There was a problem hiding this comment.
where do they default to flat?
There was a problem hiding this comment.
tbh not really a big issue - was just thinking more in the situation where checking whether action_key is nested, so they always resolve to flat strings.
… in test QValueActor does not accept out_keys, but ModelConfig (the parent of QValueModelConfig) defaults out_keys=None which Hydra forwards as a kwarg. Drop it in _make_qvalue_model. Also add the missing depth=2 to MLPConfig calls in test_qvalue_model_config and test_ppo_trainer_config: passing num_cells as an int requires depth.
Local ufmt run with the newer black 26.x kept some constructs unchanged that CI's pinned `black==22.3.0` (per .pre-commit-config.yaml) wants to re-shape: an MLPConfig call collapses to one line, and a nested ternary splits across three. Re-formatted with the CI-pinned versions to keep the lint hook clean across the merge with main (which picked up additional unformatted code from pytorch#3694 since the previous push). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Description
DQNTrainerto support custom key and potential reward aggregationQValueActorMotivation and Context
This change is required for better trainer support of multi-agent algorithms.
Types of changes
What types of changes does your code introduce? Remove all that do not apply:
Checklist
Go over all the following points, and put an
xin all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!