Benchmark for deeper models #64

li-js · 2018-05-22T03:03:09Z

Thanks for sharing the great code!

I can also get similar AP for both box and segm with R-50-FPN model, as confirmed in Issue #24.

I am wondering if there are some benchmark results for deeper models like R-101-FPN. On my side, the results for R-101-FPN is not as good as the one in Detectron. Do you guys reproduce the performance of Detectron (box ap 40, segm ap 35.9) for R-101-FPN @roytseng-tw @Rizhiy?

roytseng-tw · 2018-05-22T03:51:08Z

I haven't try training from scratch with R-101 backbone. What's your results with R-101-FPN and how did you training it (command, number of GPUs) ?

fitsumreda · 2018-05-22T03:53:58Z

@li-js @roytseng-tw @Rizhiy
My runs still do not reproduce the latest benchmarks from @roytseng-tw
I used this commit ab028df in pytorch 0.3.0.post4, I think this is the second to the last commit.

I tried three experiments, 2x over 4 GPUs and 1x over 8GPUs:
All evaluation results below are obtained using ckpt/model_step89999.pth

4GPUs python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-FPN_1x.yaml --nw 16 --use_tfboard
- I got bbox mAP 0.3587 and mask mAP 0.3243
- I got bbox mAP 0.3592 and mask mAP 0.3243
8GPUs python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-FPN_1x.yaml --nw 16 --use_tfboard
- I got bbox mAP 0.3604 and mask mAP 0.3263

Note, the codes do produce expected numbers if I do evaluation using detectron checkpoints.
I think, tools/test_net.py is fine.

Evaluation command
python3 tools/test_net.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-FPN_1x.yaml --load_ckpt /path/to/checkpoint/model_step89999.pth --multi-gpu-testing

Any thoughts?

li-js · 2018-05-22T04:59:12Z

For the Res101-FPN, it is not really training from scratch, as the ImageNet pretrained weights from Caffe is loaded.

For the settings, I used 4 GPUs (GeForce GTX 1080 Ti) with python3, pytorch 0.3.1.post2 and cuda 8.0.

I have two sets of results.
Set 1:
NUM_GPUS: 4
MAX_ITER: 360k (STEPS adjusted accordingly)
BASE_LR: 0.01
IM_PER_GPU: 1
python3 tools/train_net_step.py --dataset coco2017 --cfg config/e2e_mask_rcnn_R-101-FPN_2x_[modified to use 4 gpus as above].yaml

Results: Seg AP: 0.333, Box AP: 0.369 on last step.

Set 2:
I use iter_size=2 to increase effective batch size with the same config.
python3 tools/train_net_step.py --dataset coco2017 --cfg config/e2e_mask_rcnn_R-101-FPN_2x_[modified to use 4 gpus as above].yaml --iter_size 2
I noted that the MAX_TER is automatically scaled down to 180k.

Results: Seg AP: 0.336, Box AP: 0.368 on last step.

The results are similar to R-50-FPN.
Any help is appreciated @roytseng-tw

fitsumreda · 2018-05-22T05:04:46Z

@li-js could you share your settings for R-50-FPN that reproduced the desired numbers?

li-js · 2018-05-22T05:09:37Z

@fitsumreda Sure
I only use two GPUs with two images per gpu, with BASE_LR 0.005 and a total 360k iterations.
Other setting are the same as in e2e_mask_rcnn_R-50-FPN_1x.yaml and train_net_step.py was used. Suprisingly I got 34.1 Seg AP and 37.9 Box Ap.

fitsumreda · 2018-05-22T05:10:57Z

Thank you so much, @li-js !

roytseng-tw · 2018-05-22T05:45:22Z

@li-js Did you modify NUM_GPUS in the config file ? If yes, do not. I have already emphasized that in README. Maybe I should make it clearer.

li-js · 2018-05-22T06:13:59Z

I did modify the NUM_GPUS to be 4 in my case. Thanks for pointing it out.

So if I only have 4 GPUs and each GPU can only hold 1 image, what is the suggested training schedule? Since the Max_Iter and BASE_LR will be adjusted automatically, am I right to just use the cfg file here unchanged and use the following command?

python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-101-FPN_2x.yaml --bs 4

And use the following for 4GPUs and each GPU can hold 2 images:
python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-101-FPN_2x.yaml --bs 8

Correct me if I am wrong.

roytseng-tw · 2018-05-22T06:24:11Z

Yes, you are correct. 😃
Moreover, you can use --iter_size X to mimic bigger batch size as you wish.
And if possible, I think maybe it's better to keep the same IMS_PER_BATCH, 2 for most cases.

li-js · 2018-05-22T06:34:50Z

Thanks, closing here. In official Detectron, the ResNeXt series backbone all use 1 images per batch due to memory constraints, yet they still have even better performance than R-101 series.

Still looking forward to a benchmark on R-101-FPN/ResNext-series if anyone successfully reproduces the results. 💯

li-js · 2018-05-30T03:28:02Z

@roytseng-tw With your suggestions, I trained with:

python tools/train_net_step.py --dataset coco2017 --cfg configs/baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --bs 8 --iter_size 2

without changing the config file. I got better performance AP seg 34.5, AP det 38.5, but still not matching official Detectron's AP det 40, AP seg 35.9.

Any suggestions are appreciated.

roytseng-tw · 2018-05-30T04:37:55Z

I think these numbers may be reasonable on my experience. When I trained e2e_mask_rcnn_R-50-FPN_2x.yaml before, I always got numbers lower than Detectron's. However, as reported by you and others in the issues, your scores are matched to or even better than Detectron's. So think it's just some uncertainty in the training of deep neural networks that leads to this performance differences.

li-js closed this as completed May 22, 2018

li-js reopened this May 30, 2018

li-js closed this as completed May 31, 2018

li-js mentioned this issue Jun 12, 2018

R101-FPN results unreproducible #77

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark for deeper models #64

Benchmark for deeper models #64

li-js commented May 22, 2018

roytseng-tw commented May 22, 2018

fitsumreda commented May 22, 2018 •

edited

li-js commented May 22, 2018

fitsumreda commented May 22, 2018

li-js commented May 22, 2018

fitsumreda commented May 22, 2018

roytseng-tw commented May 22, 2018 •

edited

li-js commented May 22, 2018

roytseng-tw commented May 22, 2018

li-js commented May 22, 2018

li-js commented May 30, 2018

roytseng-tw commented May 30, 2018

Benchmark for deeper models #64

Benchmark for deeper models #64

Comments

li-js commented May 22, 2018

roytseng-tw commented May 22, 2018

fitsumreda commented May 22, 2018 • edited

li-js commented May 22, 2018

fitsumreda commented May 22, 2018

li-js commented May 22, 2018

fitsumreda commented May 22, 2018

roytseng-tw commented May 22, 2018 • edited

li-js commented May 22, 2018

roytseng-tw commented May 22, 2018

li-js commented May 22, 2018

li-js commented May 30, 2018

roytseng-tw commented May 30, 2018

fitsumreda commented May 22, 2018 •

edited

roytseng-tw commented May 22, 2018 •

edited