New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Py-Faster-Rcnn using Resnet #122
Comments
Experiment with different batch sizes in the yml file or in config.py |
ResNet 101 on Caffe requires >10G GPU memory for an input with VGA resoltuion (640*480) during training (even when you fixed all conv1_x/conv2_x/conv3_x layers). A couple of memory optimizations can be done easily though.
A more fundamental solution is to allow Caffe to reuse the gradients (diff) for each blob. One can safely rewrite the diff of a blob when the weights of all layers including the blob had been updated. And that's the way to train ResNet 101 using a batch size of 32 on 12G memory as mentioned in the original paper. |
@happyharrycn I find something interesting when I train resnet + faster rcnn on my own dataset. If I fix all batchNorm+scale layers on conv1 ~ conv4, and only allow updates on conv layers, the resulting model is far from the paper claims. If I allow batchNorm+scale to update, it gets much better performance (close to vgg16). But faster rcnn only uses one image per batch and is not supposed to update batchNorm+scale properly. What am I missing? |
@kl2005ad ResNet for detection was re-produced at multiple sites. And I am not sure why you are getting worse performance based on your description here. The last time I tried on VOC, it is working better than they claimed in the paper :) Here are some implementation details I used for training.
|
@kl2005ad @happyharrycn Hi! I try to train resnet50 with faster-rcnn. And I got a very low result on voc2007, about 0.47, even lower than ZF model's 0.62. What's the result you got on voc2007? I didn't freeze the batchNorm+Scale layers. Is my solver correct? Thank you very much! |
I was getting mAP ~0.73-0.74 on VOC07 test when using ResNet101 (trained on VOC07 trainval) with 60K iterations. Training details can be found in my previous post in this thread. By a quick look at your solver file, I think you probably had too many iterations (500K is way too much). |
@happyharrycn Thank you~ I was getting mAP 0.65 on VOC2007 using ResNet50 with 70k iterations, according your implementation details to fix all batchNorm+scale and conv_x. I will try to use ResNet101. It seems BN must be fixed when fine-tuning, right? |
@banxiaduhuo, maybe a BatchNorm layer acts as a Scale layer during test time, and we can merge two consecutive Scale layers into one. |
@happyharrycn I am trying to finte-tune resnet-50 and faster-rcnn for COCO dataset as mentioned in Kaiming's paper by using a learning rate of 0.001 for 240K iterations and 0.0001 for next 80K iterations (using the provided end2end training). |
@ice-pice I think 240K + 80K is actually not enough iterations for training on COCO. I have used 500K iterations for 120K images (COCO train + val). Have you tried to keep the training running for the full 320K iteration and check whether the AP keeps decreasing after 150K? |
I want to notify that my 320K iterations process 320K images. In the 500K iterations you mentioned, do you process 500K images or 500*8K? Thanks! |
@ice-pice What is your training speed? Could you share your log file so I can see the change of loss? |
@CrossLee1 From my observation change of loss is not reflective of convergence in this case because of a mini-batch size of 1. After 50K iterations or so, the loss value fluctuates in the same interval until 320K. I am still making changes and if I reach the baseline, I can share the prototxt with you if you'll like. |
@ice-pice Wish you have a good result~ |
@ice-pice |
how to solve this ? |
@CrossLee1 With resnet50 + py-faster-rcnn, I am able to achieve 45% mAP. |
@ice-pice |
@sarkeribrahim `Build the Cython modules cd $FRCN_ROOT/lib |
@CrossLee1 My resnet-50 + faster-rcnn prototxt. |
@ice-pice |
@happyharrycn can you share your train&test prototxt of "resnet + faster rcnn"? |
@ice-pice ResNet50 and ResNet101 that I trained both close to 44.4 mAP, How about yours? |
@banxiaduhuo I did implement box refinement strategy and it gave me a 1.3% boost as compared to 2% mentioned in the paper. Did not get a chance to try the other 2 strategies. |
@yjn870 : You need to remove the topmost data layer as it is not required while testing. Remove all the layers which are unnecesary while testing. |
Hello. I am trying to run Resnet101 with Faster RCNN on AWS 4gb K520 gpu. I realized that this GPU memory wont be enough and got the same error. I wanted to ask if AWS with g2.8xlarge instance( 4 GPUs with 4GB each) should do the job and has anyone tried that? Thanks. Any help will be appreciated. |
Hi everyone. I am having trouble with the make file and finding ./tools/demo.py. It says for the make file no targets found. I installed it in cuda/fast-rcnn/lib and cuda. When is try to run ./tools/demo.py it says no directory found. |
@ice-pice @banxiaduhuo can you share how you do bbox refinement? |
@happyharrycn @banxiaduhuo i try resnet50 and 80k iterations, get 73.59% mAP for pascal voc 07. |
Did anyone try ResNet-50 or higher depth with COCO classes using pyfaster-rcnn? |
I tried ResNet-50 prototxt from ice-pice and ResNet-50 prototxt from siddharthm83. The base_lr=0.001(step=300000), total_iters=490000. However, I only get map 0.265(IoU=0.5) in coco. Did anyone have ResNet-50+py-faster-rcnn pretrained model of coco? @ice-pice @banxiaduhuo I can not reproduce your result, Could you help me? |
@spandanagella I release a implementation (prototxt file and model weights) of ResNet-101 based faster-rcnn, check this repos |
@Eniac-Xie Thanks. I am looking for ResNet model trained on COCO object categories. Looks like you have resnet based faster-rcnn for just PASCAL-VOC. |
@KeyKy @ice-pice @banxiaduhuo @Eniac-Xie @zimenglan-sysu-512 I'm trying to train the ResNet-50 model on PASCAL VOC 2007 trainval dataset. I've followed the comments in issue #62. So, I'm using this command to start the training
I'm using the solver/train prototxt files from @twtygqyy repo However, I'm getting this error:
I'm on the latest commit of Pardon my lack of knowledge, but would you guys mind helping me resolve this error, please? Appreciate it. Thanks. |
make it |
Here is the caffe-fast-rcnn with upstream caffe https://github.com/twtygqyy/caffe-fast-rcnn-upstream |
@onkarganjewar @twtygqyy @Eniac-Xie @happyharrycn I tried R-FCN + ResNet-101(from jifeng dai Orpine https://github.com/Orpine/py-R-FCN),why R-FCn-ohem take 5g of the gpu memory ,but the faster-rcnn+resnet-101-bn-scale-merged-ohem (from @Eniac-Xie) take 11g of the gpu memory. I don't know what difference make it. Somebody,Please |
@646677064 because of the full connection layers , they have most of the parameters. |
I'm using matlab 2017a student version, gpu: gtx 1060 (6 gb) |
@646677064 I have tried faster-rcnn+resnet-50-bn-scale-merged-ohem(from @Eniac-Xie),but there is an error when i run "./experiments/scripts/faster_rcnn_end2end.sh 0 ResNet-50 pascal_voc",like this: Do you know how to solve it?Can you share your faster-rcnn+resnet-101-bn-scale-merged-ohem (from @Eniac-Xie) test.prototxt file with me?Thank you very much!!! |
Have you found the reason of training slow problrm? I met the same issue. About 4s/iter on Titan X. @CrossLee1 |
Same thing happened to me! Any idea yet? @nnop |
Excuse me.When I trained my own model, I used the model I trained to run demo.py to detect the graph. When the pixel was large (5000,3000), the results were all white include image.If the image pixel is not too large, there is no problem.What's the reason?(当我训练好自己的模型时,用自己训练的模型运行demo.py,去检测图形,当检测图片像素很大时(5000,3000),检测出来的结果是全白包括图片。如果图片像素不是太大,就不会出问题。请问这是什么原因?) |
Based on #62
I am trying to train my own dataset using resnet+py-faster-rcnn (using @siddharthm83 train.txt). I am getting the following error.
I0321 07:29:44.037149 1892 solver.cpp:60] Solver scaffolding done.
Loading pretrained model weights from data/imagenet_models/resnet.caffemodel
I0321 07:29:44.240974 1892 net.cpp:816] Ignoring source layer fc1000
I0321 07:29:44.241065 1892 net.cpp:816] Ignoring source layer prob
Solving...
F0321 07:29:45.412804 1892 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
Aborted (core dumped)
I am using AWS instance. I was able to train resnet-50 (without fast-rcnn) using the same instance with same dataset. But when I tried using py-faster-rcnn, I am getting this error. I know this error could possibly be due to insufficient memory. So I changed the batch size in deploy.prototxt (iter_size: 1). But still I am getting the error. Can someone help me out?
The text was updated successfully, but these errors were encountered: