Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPT implementation contains unused parameters #30633

Closed
2 tasks done
ducha-aiki opened this issue May 3, 2024 · 5 comments
Closed
2 tasks done

DPT implementation contains unused parameters #30633

ducha-aiki opened this issue May 3, 2024 · 5 comments
Labels

Comments

@ducha-aiki
Copy link

ducha-aiki commented May 3, 2024

System Info

Kind of irrelevant, but:

- `transformers` version: 4.40.0
- Platform: macOS-14.4.1-arm64-arm-64bit
- Python version: 3.9.16
- Huggingface_hub version: 0.20.3
- Safetensors version: 0.4.2
- Accelerate version: 0.22.0
- Accelerate config: 	not found
- PyTorch version (GPU?): 2.3.0 (False)
- Tensorflow version (GPU?): 2.13.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: n/a
- Using distributed or parallel set-up in script?: yes

Who can help?

The first (zeroth) fusion layer is never used and causing issues like when run on DDP:

Parameters which did not receive grad for rank 3: neck.fusion_stage.layers.0.residual_layer1.convolution2.bias, 
neck.fusion_stage.layers.0.residual_layer1.convolution2.weight, neck.fusion_stage.layers.0.residual_layer1.convolution1.bias, 
neck.fusion_stage.layers.0.residual_layer1.convolution1.weight

@amyeroberts

Information

  • The official example scripts
  • My own modified scripts

Reproduction

We take the code from DPT doc page
https://huggingface.co/docs/transformers/main/en/model_doc/dpt

make forward-backward pass and check unused parameters

import torch
from transformers import Dinov2Config, DPTConfig, DPTForDepthEstimation

# initialize with a Transformer-based backbone such as DINOv2
# in that case, we also specify `reshape_hidden_states=False` to get feature maps of shape (batch_size, num_channels, height, width)
backbone_config = Dinov2Config.from_pretrained("facebook/dinov2-base", out_features=["stage1", "stage2", "stage3", "stage4"], reshape_hidden_states=False)

config = DPTConfig(backbone_config=backbone_config)
model = DPTForDepthEstimation(config=config)

out=model(torch.rand(1,3,512,512))
loss = out.predicted_depth.mean()
loss.backward()
for n, p in model.named_parameters():
    if p.grad is None:
        if 'backbone' in n: continue # part of backbone is not used and that is fine
        print(f"found unused param, {n}")

Result:

found unused param, neck.fusion_stage.layers.0.residual_layer1.convolution1.weight
found unused param, neck.fusion_stage.layers.0.residual_layer1.convolution1.bias
found unused param, neck.fusion_stage.layers.0.residual_layer1.convolution2.weight
found unused param, neck.fusion_stage.layers.0.residual_layer1.convolution2.bias

This prevents DDP training. To fix that, one should add line into DPT model

self.neck.fusion_stage.layers[0].residual_layer1 = None

Here is the same fix in the mmsegmentation

https://github.com/open-mmlab/mmsegmentation/blob/main/mmseg/models/decode_heads/dpt_head.py#L271

self.fusion_blocks[0].res_conv_unit1 = None

I can submit a PR which makes this fix

Expected behavior

Not have unused parameters.

@qubvel qubvel added the Vision label May 3, 2024
@qubvel
Copy link
Member

qubvel commented May 3, 2024

Hi @ducha-aiki, thanks for reporting!

You are right, it looks like we can safely delete layers[0].residual_layer1 from DPTFeatureFusionStage because its never used.

Would you mind sharing why this prevents DDP training?

@ducha-aiki
Copy link
Author

@qubvel I believe I shared this in:

Parameters which did not receive grad for rank 3: neck.fusion_stage.layers.0.residual_layer1.convolution2.bias, neck.fusion_stage.layers.0.residual_layer1.convolution2.weight, neck.fusion_stage.layers.0.residual_layer1.convolution1.bias, neck.fusion_stage.layers.0.residual_layer1.convolution1.weight

That is a quote from the error crash message I am getting, when running with accelerate for multi-GPU, when I specify in Trainer ddp_find_unused_parameters=False.

@qubvel
Copy link
Member

qubvel commented May 3, 2024

Thank you, I missed it 🙂 I am trying to understand why backbone unused weights are not blocking, while neck's block. Did you try training with a fix?
Anyway if this solves the issue it is worth a PR.

@ducha-aiki
Copy link
Author

@qubvel good point about the backbone. Probably because I have trained with a frozen backbone, which is kind of common.
And about the backbone removing unused params there would probably required too much changes.
I will do a PR then, thanks.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants