Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use OHEM loss function? #15

Closed
qijiezhao opened this issue Dec 26, 2018 · 12 comments
Closed

How to use OHEM loss function? #15

qijiezhao opened this issue Dec 26, 2018 · 12 comments
Labels
result reproduced The results are well reproduced

Comments

@qijiezhao
Copy link

qijiezhao commented Dec 26, 2018

Hi, Zilong:

Thanks for your contribution to this amazing repo! I really appreciate of this!
By the way, I want to reproduce the highest score in your paper: 81.4 on Cityscapes test set.(with 4 V100 GPUs).
I have already reproduced the default setting single-scale result on val set: 79.7 on my machine.
To reach the 81.4, as far as I have known: besides the default settings, OHEM loss function and multi-scale inference should also be used. But when I re-train the code with setting --ohem True, the evaluation results drop a lot - about 10 points are decreased.
So my problem is, what also should I modify to achieve the result as proposed in the original paper?

Thanks a lot for your help!

@speedinghzl
Copy link
Owner

@qijiezhao Thanks for your attention. I have updated run_local.sh. The hyperparameters of OHEM are --ohem-thres 0.7 --ohem-keep 100000.
To achieve the 81.4 on the testing set, you need to train the model with both the training set and validation set for 100000 iterations, then you need to finetune this model with a low learning rate and frozen sync bn.
Besides, Can you report your result in Reproduce Performance Discussion with your env information? Thank you very much.

@qijiezhao
Copy link
Author

Yeah, I reproduced the results of PSPNet、FCN_NonLocal、Deeplabv3 and CCNet with 769x769 input size(±0.2%). I think these steps are not that troublesome.

@speedinghzl
Copy link
Owner

Great! Does FCN_NonLocal use Resnet-101 as the backbone? Can you tell me the performance of FCN_NonLocal? I don't have V100 to run FCN_NonLocal with Resnet-101.

@qijiezhao
Copy link
Author

qijiezhao commented Dec 26, 2018

Great! Does FCN_NonLocal use Resnet-101 as the backbone? Can you tell me the performance of FCN_NonLocal? I don't have V100 to run FCN_NonLocal with Resnet-101.

Yes, FCN_nonlocal with r101, I have tuned with a lot different settings to maximize the performance, the highest is around 79.97(val set, single scale, however, much more params than CCNet then). The tricks are like: tuning the key channels(pruning), decrease the unnecessary channels etc.

In addition, If I use the segmentation toolbox code to run ohem loss function, the modification is also:
--ohem-thres 0.7 --ohem-keep 100000 ?

@speedinghzl
Copy link
Owner

Ok, thanks for the information.
Yes, the segmentation toolbox can also use the same OHEM parameters: --ohem-thres 0.7 --ohem-keep 100000.

@qijiezhao
Copy link
Author

qijiezhao commented Dec 26, 2018

Thanks for your help, my command is:
python train.py --data-dir /path/to/my/data_dir/ --random-mirror --random-scale --restore-from ./dataset/resnet101-imagenet.pth --gpu 0,1 --learning-rate 0.01 --weight-decay 0.0005 --batch-size 8 --num-steps 40000 --ohem True --ohem-thres 0.7 --ohem-keep 100000
This is for training only on train set. As far as I can get improvement, I will try to add the val set and plus more iterations.
I will report my final OHEM results here when I get them.

@speedinghzl speedinghzl added the result reproduced The results are well reproduced label Dec 26, 2018
@qijiezhao
Copy link
Author

qijiezhao commented Jan 4, 2019

Hi, zilong:

I have evaluated normal FCN-non-local with OHEM and it improves 0.7 points than without OHEM. However, when I change to my own method, it gets no improvement, I think my method may conflict with OHEM. Have you ever met this problem when designing CCNet.

I want to keep a long-term communication with you on semantics segmentation problems. Can you add a WeChat friend with me? My ID is zhaoqijie8356. Thanks.

@lxtGH
Copy link

lxtGH commented Jan 6, 2019

Hi, I also met the same problem, Using OHEM did no improvement to my own method. It seems that OHEM may be sensitive to network architecture.

@EthanZhangYi
Copy link

Hi, @qijiezhao
Can you report the results of PSPNet, FCN_NonLocal, Deeplabv3 and CCNet with 769x769 input size, trained on Cityscapes train set and evaluated on val set. Moreover, can you report your environment info here?

Thanks

@qijiezhao
Copy link
Author

Hi, @qijiezhao
Can you report the results of PSPNet, FCN_NonLocal, Deeplabv3 and CCNet with 769x769 input size, trained on Cityscapes train set and evaluated on val set. Moreover, can you report your environment info here?

Thanks

Sure。
PSPNet, 78.6
Deeplabv3,78.8
CCNet,79.8
FCN_Nonlocal,79.2

The env is: 2V100 GPUs, Pytorch0.4.1, images per GPU is 4.

@EthanZhangYi
Copy link

@qijiezhao Thanks very much

@zhangpj
Copy link

zhangpj commented Nov 1, 2020

@qijiezhao Thanks for your attention. I have updated run_local.sh. The hyperparameters of OHEM are --ohem-thres 0.7 --ohem-keep 100000.
To achieve the 81.4 on the testing set, you need to train the model with both the training set and validation set for 100000 iterations, then you need to finetune this model with a low learning rate and frozen sync bn.
Besides, Can you report your result in Reproduce Performance Discussion with your env information? Thank you very much.

@speedinghzl Can you share the exact learning rate and the number of training steps for finetune this model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
result reproduced The results are well reproduced
Projects
None yet
Development

No branches or pull requests

5 participants