Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to train this model #2

Closed
Fly-dream12 opened this issue Nov 11, 2019 · 9 comments
Closed

how to train this model #2

Fly-dream12 opened this issue Nov 11, 2019 · 9 comments

Comments

@Fly-dream12
Copy link

I have a question when I try to train a model based on my own dataset which is in voc format. But when I begin to train, there was a strange problem. After the pretrained model is loaded, the code stays static and stops at 'Loading pretrained weights from ./data/pretrained_model/vgg16_caffe.pth
'. Without any error,I wonder whether it cannot be trained rightly. I am eager to get your help.

Thanks

@harsh-99
Copy link
Owner

Hi,
Did you tried to train the model on any of the given dataset with vgg backbone? I think it would be better if you try that once just to ensure things are working on your system as I have not faced this problem till now.
Apart from that we have provided the direction for training on new dataset -> Just one addition in readme update both lib/model/utils/parser_func.py as well as lib/model/utils/parser_func_multi.py. I will integrate the code so from now ownwards there is just one update required.
I hope above thing help, however if it doesn't let me know.
Thanks

@Fly-dream12
Copy link
Author

I have changed the corresponding path in those functions and ensure they are right. May be it is too slow to train so it didn't appear anything. Since I have not downloaded the given dataset, could you give the link of the typical dataset so I can have a try.

@Fly-dream12
Copy link
Author

I continue to debug this project and find the image is not righted loaded. Then I encountered another problem when do RCNN_roi_crop in the forward process of faster_rcnn_SCL in this line:
pooled_feat = self.RCNN_roi_crop(base_feat, Variable(grid_yx).detach())

The error is:
torch.FatalError: aborting at /data/ztc/jinke/faster-rcnn.pytorch/lib/model/roi_crop/src/roi_crop_cuda.c:49
May i did not compile a right C file, could you help ? @ harsh-99

@harsh-99
Copy link
Owner

I believe that you have not compiled all the files. Please make sure you have correct cuda and pytorch version and then follow -:
cd lib
sh make.sh
If you get some error while compiling, let me know.

@Fly-dream12
Copy link
Author

Now I can train this model, but the loss of rpn_cls becomes nan when epoch 1 iter 100/10000. By the way, i have decreased the learning rate to 0.0002. So what can the problem be? @harsh-99

@harsh-99
Copy link
Owner

That usually occurs when the labelled dataset have some bounding box which have few indices in negative. There are few threads who have faced same problem while training object detection module.
Please refer to this -:
jwyang/faster-rcnn.pytorch#136

@Fly-dream12
Copy link
Author

May be in lib/dataset/pascal_voc.py, the corresponding code should be added.
if x1 < 0 or y1 < 0:
continue
if abs(x1 - x2) <= 100 or abs(y1-y2) <= 100:
continue
Moreover, should the learning rate be adjusted?

@harsh-99
Copy link
Owner

Hi,
I have never faced any problems because of the learning rate and I have initialiesed lr in range of 1e-2 to 1e-4. About adding the given lines in the code, since that is dependent on dataset I don't think that's necessay to add for the dataset I have written code since I have not faced any such problems and if one follows the same instructions they also won't have any issues.

In case if you have any other issue please let me know else it would be great if you can close the issue.

@Fly-dream12
Copy link
Author

When I have trained a model and begin to test, the checkpoint can't be loaded rightly like this:
RuntimeError: Error(s) in loading state_dict for vgg16:
Missing key(s) in state_dict: "netD.conv1.weight", "netD.bn1.weight", "netD.bn1.bias", "netD.bn1.running_mean", "netD.bn1.running_var", "netD.conv2.weight", "netD.bn2.weight", "netD.bn2.bias", "netD.bn2.running_mean", "netD.bn2.running_var", "netD.conv3.weight", "netD.bn3.weight", "netD.bn3.bias", "netD.bn3.running_mean", "netD.bn3.running_var", "netD.fc.weight", "netD.fc.bias", "netD_pixel.conv1.weight", "netD_pixel.conv2.weight", "netD_pixel.conv3.weight".
Unexpected key(s) in state_dict: "netD_img.conv_image.weight", "netD_img.conv_image.bias", "netD_img.bn_image.weight", "netD_img.bn_image.bias", "netD_img.bn_image.running_mean", "netD_img.bn_image.running_var", "netD_img.bn_image.num_batches_tracked", "netD_img.fc_1_image.weight", "netD_img.fc_1_image.bias", "netD_img.bn_2.weight", "netD_img.bn_2.bias", "netD_img.bn_2.running_mean", "netD_img.bn_2.running_var", "netD_img.bn_2.num_batches_tracked", "netD_inst.fc_1_inst.weight", "netD_inst.fc_1_inst.bias", "netD_inst.fc_2_inst.weight", "netD_inst.fc_2_inst.bias", "netD_inst.bn.weight", "netD_inst.bn.bias", "netD_inst.bn.running_mean", "netD_inst.bn.running_var", "netD_inst.bn.num_batches_tracked".

What may be the reason for this phenomenon @harsh-99

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants