Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAM mIoU #13

Closed
arnike opened this issue Feb 16, 2019 · 8 comments
Closed

CAM mIoU #13

arnike opened this issue Feb 16, 2019 · 8 comments

Comments

@arnike
Copy link

arnike commented Feb 16, 2019

Hi Jiwoon,

thanks for sharing this nice work!
I'm trying to generate the CAMs with a model I trained myself, but the mIoU I get is quite low, 42.28 (on PASCAL VOC2012 / train).
I first ran

python3 train_cls.py --lr 0.1 --batch_size 16 --max_epoches 15 --crop_size 448 --network network.resnet38_cls --voc12_root ./data --weights weights/ilsvrc-cls_rna-a1_cls1000_ep-0001.pth --wt_dec 5e-4

and then generated the CAMs with

python3 infer_cls.py --infer_list voc12/train.txt --voc12_root ./data --network network.resnet38_cls --weights res38_cls.pth --out_cam_pred ./cams

Is there anything amiss? To get mIoU of 48% in the paper, should I use the dCRF? Here I don't.

More details, in case it helps:

Class # IoU Pr Re
background 10581 71.6 83.4 83.9
aeroplane 586 37.8 40.8 93.8
bicycle 485 43.6 50.3 86.1
bird 698 34.1 39.6 86.1
boat 460 28.1 35.1 79.0
bottle 651 27.0 31.9 81.9
bus 385 61.3 77.2 79.3
car 1079 39.9 50.1 81.1
cat 1000 43.8 73.3 57.9
chair 1063 34.0 44.1 68.3
cow 262 42.2 54.8 78.0
diningtable 520 39.2 54.6 65.4
dog 1176 41.9 66.4 63.4
horse 444 40.4 58.4 65.6
motorbike 481 51.9 62.8 83.0
person 3876 37.2 50.5 66.7
potted-plant 485 35.0 45.8 76.8
sheep 299 43.8 50.6 86.4
sofa 474 43.5 65.9 60.0
train 499 52.6 65.9 79.6
tv/monitor 548 38.9 45.3 87.4
ambiguous 330 0.0 0.0 0.0

mIou: 42.28 (background included)

Best,
Nikita

@jiwoon-ahn
Copy link
Owner

Hi @arnike,
You don't need to apply dCRF to get 48% mIoU. It seems you have tested it on the PASCAL VOC 2012 + SBD augmented dataset which is not the case of what's reported in the paper. Could you try again on the PASCAL VOC 2012 train set only? (The dataset should contain less than 1500 images.)

@arnike
Copy link
Author

arnike commented Feb 19, 2019

The numbers are indeed from VOC + SBD trainset. On VOC only (1464 images) I get mIoU: 38.44.
Are the models also trained on VOC only in Table 1? I trained on VOC+SBD. I also wondered about the way BG is computed in infer_cls.py, which is different from Eq. 2. I then tried Eq. 2 instead, with alpha=16, but got similar results.

Some more details on training, if this can help. These are the last training steps:

Iter: 9550/ 9915 Loss:0.0229 imps:3.8 Fin:Fri Feb 15 05:23:17 2019 lr: 0.0051
Iter: 9600/ 9915 Loss:0.0261 imps:3.8 Fin:Fri Feb 15 05:23:05 2019 lr: 0.0045
Iter: 9650/ 9915 Loss:0.0271 imps:3.8 Fin:Fri Feb 15 05:22:52 2019 lr: 0.0038
Iter: 9700/ 9915 Loss:0.0262 imps:3.8 Fin:Fri Feb 15 05:22:40 2019 lr: 0.0032
Iter: 9750/ 9915 Loss:0.0285 imps:3.8 Fin:Fri Feb 15 05:22:28 2019 lr: 0.0025
Iter: 9800/ 9915 Loss:0.0289 imps:3.8 Fin:Fri Feb 15 05:22:16 2019 lr: 0.0018
Iter: 9850/ 9915 Loss:0.0220 imps:3.8 Fin:Fri Feb 15 05:22:04 2019 lr: 0.0011
Iter: 9900/ 9915 Loss:0.0270 imps:3.8 Fin:Fri Feb 15 05:21:52 2019 lr: 0.0003

validating ... loss: 0.04670578511431813

The classification mAP reached is around 93.0 %.

Thanks @jiwoon-ahn for any tips.

@XZNWU
Copy link

XZNWU commented Feb 21, 2019

I don't know how to get mIou.Can you give me your mIou code if it be convenient to you.Thank you!!! @arnike

@jiwoon-ahn
Copy link
Owner

I'm not sure... The loss and the mAP you got seem fine to me.

  • The network is trained with PASCAL VOC 2012 train + SBD dataset, but it is evaluated on PASCAL VOC 2012 train set.
  • We didn't explain how to measure the performance of CAMs as it is trivial, and thresholding the bottom 20% is a common practice. But as you mentioned, using alpha=16 can give similar results.
  • The accuracy of CAMs can vary depending upon the threshold. I think it is worth trying to adjust it.

@arnike
Copy link
Author

arnike commented Feb 22, 2019

Thanks @jiwoon-ahn for the tips!
I just realised that I might not be computing IoUs correctly (thanks @xiangzhang06 for the hint :-). I computed IoU image-wise, but VOC's way is to accumulate the stats dataset-wise (i.e. count TPs, FPs, FNs in a confusion matrix and compute the Jaccard's Index from that). If I do this, I get mIoU 46.72 / 46.80 on train / val, which matched the score on val, but is still 1.3 off on train.

@jiwoon-ahn Do you think this difference is due to ending up in different local minima? I didn't experiment with it, but I presume there can be tangible fluctuations in IoU (± 1%) between different runs of SGD; after all, the net is not trained to do segmentation.

@xiangzhang06 Your question is off-topic, but do take a look at VOC devkit VOCcode/VOCevalseg.m.

@XZNWU
Copy link

XZNWU commented Feb 25, 2019

Thanks @arnike

@jiwoon-ahn
Copy link
Owner

@arnike, Glad you solved the problem!
I agree with you. As far as I know, all existing weakly supervised techniques using class attention share this issue as there does not exist no ground-truths to guide the network.

@arnike
Copy link
Author

arnike commented Feb 25, 2019

@jiwoon-ahn Thanks, I'll keep that in mind ;)

@arnike arnike closed this as completed Feb 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants