Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Benchmark for deeper models #64

Closed
li-js opened this issue May 22, 2018 · 12 comments
Closed

Benchmark for deeper models #64

li-js opened this issue May 22, 2018 · 12 comments

Comments

@li-js
Copy link

li-js commented May 22, 2018

Thanks for sharing the great code!

I can also get similar AP for both box and segm with R-50-FPN model, as confirmed in Issue #24.

I am wondering if there are some benchmark results for deeper models like R-101-FPN. On my side, the results for R-101-FPN is not as good as the one in Detectron. Do you guys reproduce the performance of Detectron (box ap 40, segm ap 35.9) for R-101-FPN @roytseng-tw @Rizhiy?

@roytseng-tw
Copy link
Owner

I haven't try training from scratch with R-101 backbone. What's your results with R-101-FPN and how did you training it (command, number of GPUs) ?

@fitsumreda
Copy link

fitsumreda commented May 22, 2018

@li-js @roytseng-tw @Rizhiy
My runs still do not reproduce the latest benchmarks from @roytseng-tw
I used this commit ab028df in pytorch 0.3.0.post4, I think this is the second to the last commit.

I tried three experiments, 2x over 4 GPUs and 1x over 8GPUs:
All evaluation results below are obtained using ckpt/model_step89999.pth

  • 4GPUs python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-FPN_1x.yaml --nw 16 --use_tfboard

    • I got bbox mAP 0.3587 and mask mAP 0.3243
    • I got bbox mAP 0.3592 and mask mAP 0.3243
  • 8GPUs python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-FPN_1x.yaml --nw 16 --use_tfboard

    • I got bbox mAP 0.3604 and mask mAP 0.3263

Note, the codes do produce expected numbers if I do evaluation using detectron checkpoints.
I think, tools/test_net.py is fine.

Evaluation command
python3 tools/test_net.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-FPN_1x.yaml --load_ckpt /path/to/checkpoint/model_step89999.pth --multi-gpu-testing

Any thoughts?

@li-js
Copy link
Author

li-js commented May 22, 2018

For the Res101-FPN, it is not really training from scratch, as the ImageNet pretrained weights from Caffe is loaded.

For the settings, I used 4 GPUs (GeForce GTX 1080 Ti) with python3, pytorch 0.3.1.post2 and cuda 8.0.

I have two sets of results.
Set 1:
NUM_GPUS: 4
MAX_ITER: 360k (STEPS adjusted accordingly)
BASE_LR: 0.01
IM_PER_GPU: 1
python3 tools/train_net_step.py --dataset coco2017 --cfg config/e2e_mask_rcnn_R-101-FPN_2x_[modified to use 4 gpus as above].yaml

Results: Seg AP: 0.333, Box AP: 0.369 on last step.

Set 2:
I use iter_size=2 to increase effective batch size with the same config.
python3 tools/train_net_step.py --dataset coco2017 --cfg config/e2e_mask_rcnn_R-101-FPN_2x_[modified to use 4 gpus as above].yaml --iter_size 2
I noted that the MAX_TER is automatically scaled down to 180k.

Results: Seg AP: 0.336, Box AP: 0.368 on last step.

The results are similar to R-50-FPN.
Any help is appreciated @roytseng-tw

@fitsumreda
Copy link

@li-js could you share your settings for R-50-FPN that reproduced the desired numbers?

@li-js
Copy link
Author

li-js commented May 22, 2018

@fitsumreda Sure
I only use two GPUs with two images per gpu, with BASE_LR 0.005 and a total 360k iterations.
Other setting are the same as in e2e_mask_rcnn_R-50-FPN_1x.yaml and train_net_step.py was used. Suprisingly I got 34.1 Seg AP and 37.9 Box Ap.

@fitsumreda
Copy link

Thank you so much, @li-js !

@roytseng-tw
Copy link
Owner

roytseng-tw commented May 22, 2018

@li-js Did you modify NUM_GPUS in the config file ? If yes, do not. I have already emphasized that in README. Maybe I should make it clearer.

@li-js
Copy link
Author

li-js commented May 22, 2018

I did modify the NUM_GPUS to be 4 in my case. Thanks for pointing it out.

So if I only have 4 GPUs and each GPU can only hold 1 image, what is the suggested training schedule? Since the Max_Iter and BASE_LR will be adjusted automatically, am I right to just use the cfg file here unchanged and use the following command?

python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-101-FPN_2x.yaml --bs 4

And use the following for 4GPUs and each GPU can hold 2 images:
python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-101-FPN_2x.yaml --bs 8

Correct me if I am wrong.

@roytseng-tw
Copy link
Owner

Yes, you are correct. 😃
Moreover, you can use --iter_size X to mimic bigger batch size as you wish.
And if possible, I think maybe it's better to keep the same IMS_PER_BATCH, 2 for most cases.

@li-js
Copy link
Author

li-js commented May 22, 2018

Thanks, closing here. In official Detectron, the ResNeXt series backbone all use 1 images per batch due to memory constraints, yet they still have even better performance than R-101 series.

Still looking forward to a benchmark on R-101-FPN/ResNext-series if anyone successfully reproduces the results. 💯

@li-js li-js closed this as completed May 22, 2018
@li-js
Copy link
Author

li-js commented May 30, 2018

@roytseng-tw With your suggestions, I trained with:

python tools/train_net_step.py --dataset coco2017 --cfg configs/baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --bs 8 --iter_size 2

without changing the config file. I got better performance AP seg 34.5, AP det 38.5, but still not matching official Detectron's AP det 40, AP seg 35.9.

Any suggestions are appreciated.

@li-js li-js reopened this May 30, 2018
@roytseng-tw
Copy link
Owner

I think these numbers may be reasonable on my experience. When I trained e2e_mask_rcnn_R-50-FPN_2x.yaml before, I always got numbers lower than Detectron's. However, as reported by you and others in the issues, your scores are matched to or even better than Detectron's. So think it's just some uncertainty in the training of deep neural networks that leads to this performance differences.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants