Fix unused parameters in attention layers by erwulff · Pull Request #462 · jpata/particleflow

erwulff · 2026-03-19T16:13:55Z

This PR fixes a critical bug in mlpf/model/mlpf.py where the output of the FFN in PreLnSelfAttentionLayer was computed but never added to the residual connection.

This was discovered when running model training with Ray Train, and the following error was encountered:

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by 
making sure all `forward` function outputs participate in calculating loss. 
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
Parameter indices which did not receive grad for rank 7: 102 103 104 105 106 107 114 115 116 117 118 119

This probably appears only when running with Ray Train, and not with bare DDP, because Ray Train enforces more strict checks by default.

The FFN block and second LayerNorm were effectively unused parameters, causing RuntimeError: Expected to have finished reduction... failures during distributed training (DDP).

TODO:

run quick before/after validation (1 GPU, 1 hour is enough), post train and val loss.

Copilot

Pull request overview

Fixes a training-time bug in PreLnSelfAttentionLayer where the FFN output was computed but never applied to the residual stream, which could leave FFN/LN parameters unused and trigger stricter DDP/Ray Train unused-parameter failures.

Changes:

Add the missing residual update x = residual + ffn_out after the FFN block in PreLnSelfAttentionLayer.forward.

Comments suppressed due to low confidence (1)

mlpf/model/mlpf.py:319

save_attention branch can raise runtime errors for attention_type == LINEAR: att_mat is never defined in the LINEAR path, and self.mha.in_proj_weight doesn't exist on LinearAttention. Consider guarding this block with self.attention_type != AttentionType.LINEAR (and/or handling LinearAttention separately), and ensure att_mat is always defined before use.

        if not self.use_simplified_attention and self.save_attention:
            np.savez(
                open("{}/attn_{}_{}.npz".format(self.outdir, self.name, self.att_mat_idx), "wb"),
                att=att_mat,
                x=x.detach().cpu().numpy(),
                in_proj_weight=self.mha.in_proj_weight.detach().cpu().numpy(),
            )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jpata · 2026-03-19T17:21:25Z

oh, good catch! can you just run a quick before/after training, like 1h each, and post the losses for posterity?

erwulff · 2026-03-20T13:37:23Z

oh, good catch! can you just run a quick before/after training, like 1h each, and post the losses for posterity?

Sure! I ran a slightly longer test than you suggested. 4h on 8xH100. Losses are slightly lower after the fix.

Figure: Total training loss before and after the fix. After fix in purple, before fix in yellow.

jpata · 2026-03-20T13:43:12Z

That's awesome! Glad you spotted the issue!

* fix: self-attention layer missing residual connection * disable automatic metric logging in Comet ML * use mlpf_config instead of args in distributed_ray.py

fix: self-attention layer missing residual connection

ad50150

erwulff changed the title ~~fix: self-attention layer missing residual connection~~ Fix Unused Parameters in Attention Layers Mar 19, 2026

erwulff changed the title ~~Fix Unused Parameters in Attention Layers~~ Fix unused parameters in attention layers Mar 19, 2026

erwulff marked this pull request as ready for review March 19, 2026 16:32

Copilot AI review requested due to automatic review settings March 19, 2026 16:32

Copilot AI reviewed Mar 19, 2026

View reviewed changes

Comment thread mlpf/model/mlpf.py

erwulff added 2 commits March 20, 2026 04:34

disable automatic metric logging in Comet ML

c17ff5c

use mlpf_config instead of args in distributed_ray.py

3fadf1e

jpata merged commit 69b9178 into jpata:main Mar 20, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unused parameters in attention layers#462

Fix unused parameters in attention layers#462
jpata merged 3 commits intojpata:mainfrom
erwulff:self-attention-fix

erwulff commented Mar 19, 2026 •

edited by jpata

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

jpata commented Mar 19, 2026

Uh oh!

erwulff commented Mar 20, 2026

Uh oh!

jpata commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

erwulff commented Mar 19, 2026 • edited by jpata Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

jpata commented Mar 19, 2026

Uh oh!

erwulff commented Mar 20, 2026

Uh oh!

jpata commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

erwulff commented Mar 19, 2026 •

edited by jpata

Loading