Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch in loading model #21

Closed
munirfarzeen opened this issue Dec 22, 2020 · 11 comments
Closed

Mismatch in loading model #21

munirfarzeen opened this issue Dec 22, 2020 · 11 comments

Comments

@munirfarzeen
Copy link

munirfarzeen commented Dec 22, 2020

Hi,
@jackroos I am trying to run the code by using the model weights provided. I used r50_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage-checkpoint.pth
and respective config.
When loading the model it gives mismatch of transformer shape.
Is the checkpoint correct?

@azamshoaib
Copy link

@likui01 @jackroos @daijifeng001 Hi, I am having the same problem. Kindly help me in this regard. Thank you

@jackroos
Copy link
Member

Could you provide the full command you ran? @likui01 @ayberksener
And do you use the latest code and latest model?

@munirfarzeen
Copy link
Author

@jackroos
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./config/r50_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage.sh

with resume from checkpoint provided.

@jackroos
Copy link
Member

@likui01 What is the detailed error message?

@munirfarzeen
Copy link
Author

@jackroos
RuntimeError: Error(s) in loading state_dict for DeformableDETR: size mismatch for class_embed.0.weight: copying a param with shape torch.Size([91, 256]) from checkpoint, the shape in current model is torch.Size([4, 256]). size mismatch for class_embed.0.bias: copying a param with shape torch.Size([91]) from checkpoint, the shape in current model is torch.Size([4]). size mismatch for class_embed.1.weight: copying a param with shape torch.Size([91, 256]) from checkpoint, the shape in current model is torch.Size([4, 256]). size mismatch for class_embed.1.bias: copying a param with shape torch.Size([91]) from checkpoint, the shape in current model is torch.Size([4]). size mismatch for class_embed.2.weight: copying a param with shape torch.Size([91, 256]) from checkpoint, the shape in current model is torch.Size([4, 256]). size mismatch for class_embed.2.bias: copying a param with shape torch.Size([91]) from checkpoint, the shape in current model is torch.Size([4]). size mismatch for class_embed.3.weight: copying a param with shape torch.Size([91, 256]) from checkpoint, the shape in current model is torch.Size([4, 256]). size mismatch for class_embed.3.bias: copying a param with shape torch.Size([91]) from checkpoint, the shape in current model is torch.Size([4]). size mismatch for class_embed.4.weight: copying a param with shape torch.Size([91, 256]) from checkpoint, the shape in current model is torch.Size([4, 256]). size mismatch for class_embed.4.bias: copying a param with shape torch.Size([91]) from checkpoint, the shape in current model is torch.Size([4]). size mismatch for class_embed.5.weight: copying a param with shape torch.Size([91, 256]) from checkpoint, the shape in current model is torch.Size([4, 256]). size mismatch for class_embed.5.bias: copying a param with shape torch.Size([91]) from checkpoint, the shape in current model is torch.Size([4]).

@jackroos
Copy link
Member

Do you change any code? It seems you change the num_classes to 4 here, but it should be 91 (set here).

@munirfarzeen
Copy link
Author

i have 4 classes, so i chnaged it to 4.

@jackroos
Copy link
Member

jackroos commented Dec 30, 2020

The checkpoint is trained on COCO detection. If you want to train on your custom dataset, you should train it by yourself and you don't need to resume the checkpoint. Thanks!

@1757525671 1757525671 mentioned this issue Aug 24, 2021
Closed
@ducvuuit
Copy link

ducvuuit commented Sep 10, 2021

Hi @likui01 @azamshoaib, has anyone fixed the problem about mismatch?

@amirhesamyazdi
Copy link

The checkpoint is trained on COCO detection. If you want to train on your custom dataset, you should train it by yourself and you don't need to resume the checkpoint. Thanks!

In DETR you can resume (transfer learn) from any checkpoint and still change the number of class. It is an obvious requirement for supporting tranfer learning. If D-DETR doesn't do that, it is something wrong about it. And plus, the problem with shape mismatch might not be just that. Could you please follow this up?

@nwoyecid
Copy link

nwoyecid commented Mar 2, 2023

You can have the line 239 of main.py:
missing_keys, unexpected_keys = model_without_ddp.load_state_dict(checkpoint['model'], strict=False)
Change to

    `missing_keys, unexpected_keys = model_without_ddp.load_state_dict( {k:v for k,v in checkpoint['model'].items() if "class_embed" not in k}, strict=False`)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants