Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key mismatch while loading the model? #28

Open
Jumabek opened this issue May 31, 2018 · 17 comments
Open

Key mismatch while loading the model? #28

Jumabek opened this issue May 31, 2018 · 17 comments

Comments

@Jumabek
Copy link

Jumabek commented May 31, 2018

I am having issue loading the trained checkpoint to FPNSSD512 model.
How can I fix that?

RuntimeError: Error(s) in loading state_dict for FPNSSD512:
	Missing key(s) in state_dict: "fpn.conv1.weight", "fpn.bn1.running_var", "fpn.bn1.bias", "fpn.bn1.running_mean", "fpn.bn1.weight", "fpn.layer1.0.conv1.weight", "fpn.layer1.0.bn1.running_var", "fpn.layer1.0.bn1.bias", "fpn.layer1.0.bn1.running_mean", "fpn.layer1.0.bn1.weight", "fpn.layer1.0.conv2.weight", "fpn.layer1.0.bn2.running_var", "fpn.layer1.0.bn2.bias", 

        Unexpected key(s) in state_dict: "module.fpn.conv1.weight", "module.fpn.bn1.weight", "module.fpn.bn1.bias", "module.fpn.bn1.running_mean", "module.fpn.bn1.running_var", "module.fpn.layer1.0.conv1.weight"
@Jumabek
Copy link
Author

Jumabek commented May 31, 2018

following code before loading the checkpoint solved the issue

if device == 'cuda':
    net = torch.nn.DataParallel(net)
    cudnn.benchmark = True

@Jumabek Jumabek closed this as completed May 31, 2018
@ahkarami
Copy link

Dear @Jumabek,
I have also your reported issue.
my script is something like this:

import torch
import torch.backends.cudnn as cudnn
from models.fpnssd.net import FPNSSD512


# Print the PyTorch Version:
print(torch.__version__)  # 0.4.0


# *************** Parameters **************** #
# Check use GPU or not
use_gpu = torch.cuda.is_available()  # use GPU
if use_gpu:
    device = torch.device("cuda:0")  
else:
    device = torch.device("cpu")


# ** Loading Pre-Trained Weights:
net = FPNSSD512(num_classes=20).to(device)
net = torch.nn.DataParallel(net)
cudnn.benchmark = True
# download pre-trained weights from:
# https://drive.google.com/open?id=1yy_kUnm_hZR3uk9yLcaQSMwxVn7wApTU
net.load_state_dict(torch.load('./fpnssd512_20_trained.pth'))
net.eval()

However, I got your reported error. Would you please help me to address this issue?

@ahkarami
Copy link

Dear @kuangliu,
Would you please answer my above question?

@Jumabek
Copy link
Author

Jumabek commented Jul 3, 2018

@ahkarami sorry for late reply.
While I do not fully understand the issue.
Can you run the code below:
I added net = torch.nn.DataParallel(net) after loading the model

import torch
import torch.backends.cudnn as cudnn
from models.fpnssd.net import FPNSSD512


# Print the PyTorch Version:
print(torch.__version__)  # 0.4.0


# *************** Parameters **************** #
# Check use GPU or not
use_gpu = torch.cuda.is_available()  # use GPU
if use_gpu:
    device = torch.device("cuda:0")  
else:
    device = torch.device("cpu")


# ** Loading Pre-Trained Weights:
net = FPNSSD512(num_classes=20).to(device)
net = torch.nn.DataParallel(net)
cudnn.benchmark = True
# download pre-trained weights from:
# https://drive.google.com/open?id=1yy_kUnm_hZR3uk9yLcaQSMwxVn7wApTU
net.load_state_dict(torch.load('./fpnssd512_20_trained.pth'))
net = torch.nn.DataParallel(net)
net.eval()

@Jumabek Jumabek reopened this Jul 3, 2018
@ahkarami
Copy link

ahkarami commented Jul 4, 2018

Dear @Jumabek,
Thank you for your reply. Sorry for my inconvenience. I have tested your recommended script, but unfortunately the error is remain. The error is:

Traceback (most recent call last):
  File "/home/user/TorchCV/Attempt1.py", line 54, in <module>
    net.load_state_dict(torch.load('./fpnssd512_20_trained.pth'))
  File "/opt/pytorch4/torch/nn/modules/module.py", line 721, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for FPNSSD512:
	Missing key(s) in state_dict: "fpn.conv1.weight", "fpn.bn1.running_mean", "fpn.bn1.running_var", ...
	Unexpected key(s) in state_dict: "extractor.conv1.weight", "extractor.bn1.weight", "extractor.bn1.bias", ....

Process finished with exit code 1

It is worth nothing that I have tested the above code on a system which has just one GTX 1080ti GPU (with CUDA 9.0 & cuDNN 7).

@dearleiii
Copy link

Hi I followed your code and seems helped me solve the issue of unexpected key,
But I'm wondering what's the reason for it to occur?
Why is DataParallel help to solve it?

@zacario-li
Copy link

@ahkarami I meet the same issue with you. Have you fixed it now?

@ahkarami
Copy link

Dear @zacario-li,
Unfortunately I couldn't address the issue. I can train & test model by my own GPU (i.e., my trained models are correct) but the released pre-trained model has the above issue. I think the problem related to this fact that the pre-trained model has been trained on a machine with multi GPU but now we want to use it in a machine with just one GPU. However, In this case using the torch.nn.DataParallel(net) command must address the problem, but we saw that this command can't solve the problem!!!

@root-master
Copy link

If you want to load the weights after DataParallel use:
net.module.load_state_dict(pertained_weights)
If you want to load the weights before DataParallel use:
net.load_state_dict(pertained_weights)

@silkylove
Copy link

Dear @ahkarami ,
I think the pretrained fpnssd model provided by @kuangliu is not the same as /models/fpnssd/net.py. Actually, he said that he just replaced vgg16 by fpn50 in ssd512 which is /models/ssd/net.py. So you could not use the model created by /models/fpnssd/net.py to load the wights in /models/ssd/net.py as the keys are not matched.
The solution to use his provided pretrained model is to train his ssd512 model with fpn50 not fpnssd512 model in /models/fpnssd/net.py.
Also, it seems that he did not put all of his examples on this github or he delete something before pushing.

@ahkarami
Copy link

Dear @silkylove,
Thank you very much for your useful information. Could you load & use his pre-trained network?
If yes, would you please release its loading code?

@silkylove
Copy link

Dear @ahkarami ,
Ok, I will release the code after I get similar performence compared to his pretrained fpnssd512 model.

@ahkarami
Copy link

Thank you very much @silkylove.

@silkylove
Copy link

silkylove commented Oct 20, 2018

@ahkarami
Please check my code.
https://github.com/silkylove/ObjectDetection/tree/master/example/fpnssd
I also uploaded the training log with adam with 100 epochs which could get 73.95mAP until now. I am now training SGD with 200 epochs on that which I think would get higher mAP, I will release the training log later.
Also, you can uncommen this line in eval.py https://github.com/silkylove/ObjectDetection/blob/master/example/fpnssd/eval.py#L25 to got his pertrained model's performence (about 56mAP). And make sure not to use dataparallel.

@ahkarami
Copy link

Dear @silkylove,
Thank you very much for your time. Your implemented and modified code is really valuable. It would be also great If you upload your pre-trained model (e.g., in Google Drive).

@silkylove
Copy link

@ahkarami
I uploaded the sgd training and eval log. And with sgd, I can only got aound 76% mAP now. The pretrained model was in here.

@ahkarami
Copy link

@silkylove,
Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants