DPT implementation contains unused parameters #30633

ducha-aiki · 2024-05-03T10:15:57Z

System Info

Kind of irrelevant, but:

- `transformers` version: 4.40.0
- Platform: macOS-14.4.1-arm64-arm-64bit
- Python version: 3.9.16
- Huggingface_hub version: 0.20.3
- Safetensors version: 0.4.2
- Accelerate version: 0.22.0
- Accelerate config: 	not found
- PyTorch version (GPU?): 2.3.0 (False)
- Tensorflow version (GPU?): 2.13.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: n/a
- Using distributed or parallel set-up in script?: yes

Who can help?

The first (zeroth) fusion layer is never used and causing issues like when run on DDP:

Parameters which did not receive grad for rank 3: neck.fusion_stage.layers.0.residual_layer1.convolution2.bias, 
neck.fusion_stage.layers.0.residual_layer1.convolution2.weight, neck.fusion_stage.layers.0.residual_layer1.convolution1.bias, 
neck.fusion_stage.layers.0.residual_layer1.convolution1.weight

@amyeroberts

Information

The official example scripts
My own modified scripts

Reproduction

We take the code from DPT doc page
https://huggingface.co/docs/transformers/main/en/model_doc/dpt

make forward-backward pass and check unused parameters

import torch
from transformers import Dinov2Config, DPTConfig, DPTForDepthEstimation

# initialize with a Transformer-based backbone such as DINOv2
# in that case, we also specify `reshape_hidden_states=False` to get feature maps of shape (batch_size, num_channels, height, width)
backbone_config = Dinov2Config.from_pretrained("facebook/dinov2-base", out_features=["stage1", "stage2", "stage3", "stage4"], reshape_hidden_states=False)

config = DPTConfig(backbone_config=backbone_config)
model = DPTForDepthEstimation(config=config)

out=model(torch.rand(1,3,512,512))
loss = out.predicted_depth.mean()
loss.backward()
for n, p in model.named_parameters():
    if p.grad is None:
        if 'backbone' in n: continue # part of backbone is not used and that is fine
        print(f"found unused param, {n}")

Result:

found unused param, neck.fusion_stage.layers.0.residual_layer1.convolution1.weight
found unused param, neck.fusion_stage.layers.0.residual_layer1.convolution1.bias
found unused param, neck.fusion_stage.layers.0.residual_layer1.convolution2.weight
found unused param, neck.fusion_stage.layers.0.residual_layer1.convolution2.bias

This prevents DDP training. To fix that, one should add line into DPT model

self.neck.fusion_stage.layers[0].residual_layer1 = None

Here is the same fix in the mmsegmentation

https://github.com/open-mmlab/mmsegmentation/blob/main/mmseg/models/decode_heads/dpt_head.py#L271

self.fusion_blocks[0].res_conv_unit1 = None

I can submit a PR which makes this fix

Expected behavior

Not have unused parameters.

The text was updated successfully, but these errors were encountered:

qubvel · 2024-05-03T14:11:41Z

Hi @ducha-aiki, thanks for reporting!

You are right, it looks like we can safely delete layers[0].residual_layer1 from DPTFeatureFusionStage because its never used.

Would you mind sharing why this prevents DDP training?

ducha-aiki · 2024-05-03T14:14:59Z

@qubvel I believe I shared this in:

Parameters which did not receive grad for rank 3: neck.fusion_stage.layers.0.residual_layer1.convolution2.bias, neck.fusion_stage.layers.0.residual_layer1.convolution2.weight, neck.fusion_stage.layers.0.residual_layer1.convolution1.bias, neck.fusion_stage.layers.0.residual_layer1.convolution1.weight

That is a quote from the error crash message I am getting, when running with accelerate for multi-GPU, when I specify in Trainer ddp_find_unused_parameters=False.

qubvel · 2024-05-03T15:45:56Z

Thank you, I missed it 🙂 I am trying to understand why backbone unused weights are not blocking, while neck's block. Did you try training with a fix?
Anyway if this solves the issue it is worth a PR.

ducha-aiki · 2024-05-03T15:51:59Z

@qubvel good point about the backbone. Probably because I have trained with a frozen backbone, which is kind of common.
And about the backbone removing unused params there would probably required too much changes.
I will do a PR then, thanks.

github-actions · 2024-07-23T08:05:35Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

qubvel added the Vision label May 3, 2024

huggingface deleted a comment from github-actions bot Jun 3, 2024

amyeroberts mentioned this issue Jun 5, 2024

Intel/dpt-swinv2-tiny-256: TypeError: unsupported operand type(s) for //: 'NoneType' and 'NoneType' #31249

Closed

4 tasks

huggingface deleted a comment from github-actions bot Jun 28, 2024

github-actions bot closed this as completed Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPT implementation contains unused parameters #30633

DPT implementation contains unused parameters #30633

ducha-aiki commented May 3, 2024 •

edited

Loading

qubvel commented May 3, 2024

ducha-aiki commented May 3, 2024

qubvel commented May 3, 2024

ducha-aiki commented May 3, 2024

github-actions bot commented Jul 23, 2024

DPT implementation contains unused parameters #30633

DPT implementation contains unused parameters #30633

Comments

ducha-aiki commented May 3, 2024 • edited Loading

System Info

Who can help?

Information

Reproduction

Expected behavior

qubvel commented May 3, 2024

ducha-aiki commented May 3, 2024

qubvel commented May 3, 2024

ducha-aiki commented May 3, 2024

github-actions bot commented Jul 23, 2024

ducha-aiki commented May 3, 2024 •

edited

Loading