how to train this model #2

Fly-dream12 · 2019-11-11T13:15:08Z

I have a question when I try to train a model based on my own dataset which is in voc format. But when I begin to train, there was a strange problem. After the pretrained model is loaded， the code stays static and stops at 'Loading pretrained weights from ./data/pretrained_model/vgg16_caffe.pth
'. Without any error,I wonder whether it cannot be trained rightly. I am eager to get your help.

Thanks

harsh-99 · 2019-11-11T13:27:45Z

Hi,
Did you tried to train the model on any of the given dataset with vgg backbone? I think it would be better if you try that once just to ensure things are working on your system as I have not faced this problem till now.
Apart from that we have provided the direction for training on new dataset -> Just one addition in readme update both lib/model/utils/parser_func.py as well as lib/model/utils/parser_func_multi.py. I will integrate the code so from now ownwards there is just one update required.
I hope above thing help, however if it doesn't let me know.
Thanks

Fly-dream12 · 2019-11-11T13:38:45Z

I have changed the corresponding path in those functions and ensure they are right. May be it is too slow to train so it didn't appear anything. Since I have not downloaded the given dataset, could you give the link of the typical dataset so I can have a try.

Fly-dream12 · 2019-11-13T03:32:38Z

I continue to debug this project and find the image is not righted loaded. Then I encountered another problem when do RCNN_roi_crop in the forward process of faster_rcnn_SCL in this line:
pooled_feat = self.RCNN_roi_crop(base_feat, Variable(grid_yx).detach())

The error is:
torch.FatalError: aborting at /data/ztc/jinke/faster-rcnn.pytorch/lib/model/roi_crop/src/roi_crop_cuda.c:49
May i did not compile a right C file, could you help ? @ harsh-99

harsh-99 · 2019-11-13T05:46:11Z

I believe that you have not compiled all the files. Please make sure you have correct cuda and pytorch version and then follow -:
cd lib
sh make.sh
If you get some error while compiling, let me know.

Fly-dream12 · 2019-11-13T12:25:36Z

Now I can train this model, but the loss of rpn_cls becomes nan when epoch 1 iter 100/10000. By the way, i have decreased the learning rate to 0.0002. So what can the problem be? @harsh-99

harsh-99 · 2019-11-13T12:29:31Z

That usually occurs when the labelled dataset have some bounding box which have few indices in negative. There are few threads who have faced same problem while training object detection module.
Please refer to this -:
jwyang/faster-rcnn.pytorch#136

Fly-dream12 · 2019-11-13T13:15:50Z

May be in lib/dataset/pascal_voc.py, the corresponding code should be added.
if x1 < 0 or y1 < 0:
continue
if abs(x1 - x2) <= 100 or abs(y1-y2) <= 100:
continue
Moreover, should the learning rate be adjusted?

harsh-99 · 2019-11-13T13:50:55Z

Hi,
I have never faced any problems because of the learning rate and I have initialiesed lr in range of 1e-2 to 1e-4. About adding the given lines in the code, since that is dependent on dataset I don't think that's necessay to add for the dataset I have written code since I have not faced any such problems and if one follows the same instructions they also won't have any issues.

In case if you have any other issue please let me know else it would be great if you can close the issue.

Fly-dream12 · 2019-11-16T06:57:04Z

When I have trained a model and begin to test, the checkpoint can't be loaded rightly like this:
RuntimeError: Error(s) in loading state_dict for vgg16:
Missing key(s) in state_dict: "netD.conv1.weight", "netD.bn1.weight", "netD.bn1.bias", "netD.bn1.running_mean", "netD.bn1.running_var", "netD.conv2.weight", "netD.bn2.weight", "netD.bn2.bias", "netD.bn2.running_mean", "netD.bn2.running_var", "netD.conv3.weight", "netD.bn3.weight", "netD.bn3.bias", "netD.bn3.running_mean", "netD.bn3.running_var", "netD.fc.weight", "netD.fc.bias", "netD_pixel.conv1.weight", "netD_pixel.conv2.weight", "netD_pixel.conv3.weight".
Unexpected key(s) in state_dict: "netD_img.conv_image.weight", "netD_img.conv_image.bias", "netD_img.bn_image.weight", "netD_img.bn_image.bias", "netD_img.bn_image.running_mean", "netD_img.bn_image.running_var", "netD_img.bn_image.num_batches_tracked", "netD_img.fc_1_image.weight", "netD_img.fc_1_image.bias", "netD_img.bn_2.weight", "netD_img.bn_2.bias", "netD_img.bn_2.running_mean", "netD_img.bn_2.running_var", "netD_img.bn_2.num_batches_tracked", "netD_inst.fc_1_inst.weight", "netD_inst.fc_1_inst.bias", "netD_inst.fc_2_inst.weight", "netD_inst.fc_2_inst.bias", "netD_inst.bn.weight", "netD_inst.bn.bias", "netD_inst.bn.running_mean", "netD_inst.bn.running_var", "netD_inst.bn.num_batches_tracked".

What may be the reason for this phenomenon @harsh-99

Fly-dream12 closed this as completed Nov 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to train this model #2

how to train this model #2

Fly-dream12 commented Nov 11, 2019

harsh-99 commented Nov 11, 2019

Fly-dream12 commented Nov 11, 2019

Fly-dream12 commented Nov 13, 2019

harsh-99 commented Nov 13, 2019

Fly-dream12 commented Nov 13, 2019

harsh-99 commented Nov 13, 2019

Fly-dream12 commented Nov 13, 2019

harsh-99 commented Nov 13, 2019

Fly-dream12 commented Nov 16, 2019

how to train this model #2

how to train this model #2

Comments

Fly-dream12 commented Nov 11, 2019

harsh-99 commented Nov 11, 2019

Fly-dream12 commented Nov 11, 2019

Fly-dream12 commented Nov 13, 2019

harsh-99 commented Nov 13, 2019

Fly-dream12 commented Nov 13, 2019

harsh-99 commented Nov 13, 2019

Fly-dream12 commented Nov 13, 2019

harsh-99 commented Nov 13, 2019

Fly-dream12 commented Nov 16, 2019