[Feature] Add QMix, VDN and IQL support to DQN trainer by Xmaster6y · Pull Request #3694 · pytorch/rl

Xmaster6y · 2026-04-29T21:38:33Z

Description

Extend DQNTrainer to support custom key and potential reward aggregation
Add mixer networks and loss configs with custom builders
Expose customisation in QValueActor
Add new sota-implementations for QMix, VDN and IQL
Sort config tests (module and loss)

Motivation and Context

This change is required for better trainer support of multi-agent algorithms.

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

New feature (non-breaking change which adds core functionality)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

pytorch-bot · 2026-04-29T21:38:37Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3694

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 17 Awaiting Approval

As of commit d90d0de with merge base 4da311b ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

theap06 · 2026-04-30T06:17:39Z

        self.log_rewards = log_rewards
        self.log_observations = log_observations

+        if self.mixing_strategy in ("qmix", "vdn"):


if self.mixing_strategy in ("qmix", "vdn"): # "iql" missing
self.register_op("batch_process", self._aggregate_agent_rewards)

no mixing for iql

theap06 · 2026-04-30T06:18:12Z

+            source_key = ("next", "agents", key[-1] if isinstance(key, tuple) else key)
+            value = batch.get(source_key, None)
+            if value is not None:
+                batch.set(next_key, value.mean(-2))


batch.set(next_key, value.mean(-2))

-2 is the agent dim

theap06 · 2026-04-30T06:25:19Z

            spec = spec.clone()
-            if "action" not in spec.keys():
-                spec["action"] = None
+            if action_key not in spec.keys(True, True):


spec logic is inverted and nested key wrapping is wrong.
The else branch fires when the key already exists and replaces the whole spec with a new Composite wrapping the old one, backwards. Also Composite({("agents", "action"): spec}) won't create proper nesting for tuple keys.

Didn't get you sorry

Like if the value keys aren't passed explicitly, they default to flat "action_value" / "chosen_action_value" instead of ("agents", "action_value") / ("agents", "chosen_action_value"). This breaks QMixerLoss. The value key defaults should mirror action_key's namespace.

where do they default to flat?

tbh not really a big issue - was just thinking more in the situation where checking whether action_key is nested, so they always resolve to flat strings.

… in test QValueActor does not accept out_keys, but ModelConfig (the parent of QValueModelConfig) defaults out_keys=None which Hydra forwards as a kwarg. Drop it in _make_qvalue_model. Also add the missing depth=2 to MLPConfig calls in test_qvalue_model_config and test_ppo_trainer_config: passing num_cells as an int requires depth.

Local ufmt run with the newer black 26.x kept some constructs unchanged that CI's pinned `black==22.3.0` (per .pre-commit-config.yaml) wants to re-shape: an MLPConfig call collapses to one line, and a nested ternary splits across three. Re-formatted with the CI-pinned versions to keep the lint hook clean across the merge with main (which picked up additional unformatted code from pytorch#3694 since the previous push). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Xmaster6y added 4 commits April 29, 2026 22:59

ma dqn

fe33d17

sota implem

337357a

mixer

074f493

custom

4bda64e

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 29, 2026

github-actions Bot added Feature New feature sota-implementations/ Modules Trainers Integrations/torch_geometric Integrations labels Apr 29, 2026

theap06 reviewed Apr 30, 2026

View reviewed changes

Xmaster6y and others added 5 commits April 30, 2026 09:27

explicit aggregated keys

c391733

Merge branch 'main' into ma-dqn

f1ef409

Merge branch 'main' into ma-dqn

cfb819e

format

6f1e4c6

vmoens merged commit 8038188 into pytorch:main May 1, 2026
59 of 67 checks passed

Xmaster6y deleted the ma-dqn branch May 1, 2026 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add QMix, VDN and IQL support to DQN trainer#3694

[Feature] Add QMix, VDN and IQL support to DQN trainer#3694
vmoens merged 9 commits into
pytorch:mainfrom
Xmaster6y:ma-dqn

Xmaster6y commented Apr 29, 2026

Uh oh!

pytorch-bot Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

theap06 Apr 30, 2026

Uh oh!

Xmaster6y Apr 30, 2026

Uh oh!

theap06 Apr 30, 2026

Uh oh!

Xmaster6y Apr 30, 2026

Uh oh!

theap06 Apr 30, 2026

Uh oh!

Xmaster6y Apr 30, 2026

Uh oh!

theap06 Apr 30, 2026

Uh oh!

Xmaster6y Apr 30, 2026

Uh oh!

theap06 Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Xmaster6y commented Apr 29, 2026

Description

Motivation and Context

Types of changes

Checklist

Uh oh!

pytorch-bot Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3694

⚠️ 17 Awaiting Approval

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot Bot commented Apr 29, 2026 •

edited

Loading