-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RPN regression loss suddenly becomes NaN #193
Comments
I met a similar problem here, trained on KITTI pedestrian detection, converted to voc format |
@ZHOUXINWEN @rxqy Have you solved this problem? I am stuck in the same problem. Thanks! |
This should help you. it's to do with your annotations and possible the "-1" when feeding in the annotations in the pascal voc dataset script. |
Thank you @Worulz. I have found this solution and fixed my problem. I train this faster-rcnn on a pedestrian dataset. When I adopted the restriction for the box sizes, the NaN problem disappeared. |
@xiaomengyc |
You just need to change this line to
|
@xiaomengyc from your description seems that's because RPN couldn't propose very small anchors? How about trying to set config.py:__C.ANCHOR_SCALES to smaller values, e.g. [1,2,3] (corresponding to 16,32,48 pixels)? |
@askerlee As I understand, ANCHOR_SCALES should be set with respect to the scale of ground-bboxes. According to my experiments, loading pre-trained weights, e.g. Faster-RCNN trained on COCO, can also avoid the NaN problem, without filtering out small bboxes. |
@xiaomengyc I've also met the same problem. But the culprits are ground bboxes of sizes around 50x10. So I applied your trick and filtered these bboxes. I guess nan appears because the proposals are too much bigger than ground bboxes and hence big rpn_box losses are incurred. |
@askerlee I might be. |
@askerlee Hi, how do you solve your problem, i met the same problem that the sizes of ground bboxes in my datast are small and around to 20*20. And i get no nan loss in first epoch and it gets nan loss from second epoch. Can you tell me your solution, thanks. |
@Tianlock do you have large and very small bounding boxes?. You could always crop the image to find feature areas. Then run the algorithm on top. |
@Tianlock I fixed it by filtering bboxes smaller than 20x20. You could set the filtering threshold to say 15x15, if 20x20 filters many useful bboxes. You could also try to reduce the learning rate at the same time. |
@Tianlock |
its just for dataset annotations.
|
Hi, I get |
Try to replace the |
Yes it works! Thanks. Although I am still getting nan loss but thanks anyways :D |
When I use this code to train on customer dataset(Pascal VOC format), RPN loss always turns to NaN after several dozen iterations.
I have excluded the possibility of Coordinates out of the image resolution,xmin=xmax and ymin=ymax.
[session 1][epoch 1][iter 12/4500] loss: 1.1964, lr: 1.00e-03
fg/bg=(16/496), time cost: 0.503772
rpn_cls: 0.1663, rpn_box: 0.0488, rcnn_cls: 0.9381, rcnn_box 0.0433
[session 1][epoch 1][iter 13/4500] loss: 0.8909, lr: 1.00e-03
fg/bg=(12/500), time cost: 0.516370
rpn_cls: 0.1984, rpn_box: 0.0421, rcnn_cls: 0.6251, rcnn_box 0.0254
[session 1][epoch 1][iter 14/4500] loss: 1.1052, lr: 1.00e-03
fg/bg=(20/492), time cost: 0.490039
rpn_cls: 0.1901, rpn_box: 0.0351, rcnn_cls: 0.8329, rcnn_box 0.0469
[session 1][epoch 1][iter 15/4500] loss: nan, lr: 1.00e-03
fg/bg=(6/506), time cost: 0.530968
rpn_cls: 0.1404, rpn_box: nan, rcnn_cls: 0.2575, rcnn_box 0.0102
The text was updated successfully, but these errors were encountered: