Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Error] RIDE Expert Assignment Module Training (Stage 2) #20

Closed
abababa-ai opened this issue Jul 14, 2021 · 2 comments
Closed

[Error] RIDE Expert Assignment Module Training (Stage 2) #20

abababa-ai opened this issue Jul 14, 2021 · 2 comments

Comments

@abababa-ai
Copy link

Thanks for your great work, here is an error when I try to use the model as given in the model zoo on iNaturalist data set. The model is based on ResNet50 backbone with 4 experts and distillation. When I try to use the model as the pretrained model to initialize the model during stage 2. It produces the following errors. It seems that the parameters of some layers are not well loaded.
My command lines are:
python train.py -c "configs/config_iNaturalist_resnet50_ride_ea.json" -r afs/RIDE/RIDE_model/imagenet_4experts_distill/checkpoint-epoch5.pth --reduce_dimension 1 --num_experts 4

The reported errors are as follows:

Loading checkpoint: ./RIDE/RIDE_model/imagenet_4experts_distill/checkpoint-epoch5.pth ...
Traceback (most recent call last):
File "/root/workspace/env_run/utils/util.py", line 59, in load_state_dict
own_state["module."+name].copy_(param)
RuntimeError: The size of tensor a (64) must match the size of tensor b (128) at non-singleton dimension 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 110, in
main(config)
File "train.py", line 73, in main
lr_scheduler=lr_scheduler)
File "/root/workspace/env_run/trainer/trainer.py", line 14, in init
super().init(model, criterion, metric_ftns, optimizer, config)
File "/root/workspace/env_run/base/base_trainer.py", line 59, in init
self._resume_checkpoint(config.resume, state_dict_only=state_dict_only)
File "/root/workspace/env_run/base/base_trainer.py", line 211, in _resume_checkpoint
load_state_dict(self.model, state_dict)
File "/root/workspace/env_run/utils/util.py", line 63, in load_state_dict
print("Error in copying parameter {}, source shape: {}, destination shape: {}".format(name, param.shape, own_sta
te[name].shape))
KeyError: 'backbone.layer1.0.conv1.weight'

@TonyLianLong
Copy link
Collaborator

There is a size mismatch between the models. As you can see, you are trying to load an ImageNet checkpoint, but config is iNaturalist. You could probably have a look into model you are using and the checkpoint you are using.

The error:

The size of tensor a (64) must match the size of tensor b (128) at non-singleton dimension 0

@abababa-ai
Copy link
Author

What a shame for such a careless mistake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants