Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce Performance Discussion #4

Closed
EthanZhangYi opened this issue Dec 3, 2018 · 30 comments
Closed

Reproduce Performance Discussion #4

EthanZhangYi opened this issue Dec 3, 2018 · 30 comments

Comments

@EthanZhangYi
Copy link

Thx for the nice job.
However I downloaded the code and trained the model, but the results in the paper were not well reproduced.

Setting

Dataset: Cityscapes
Train with train split, 2975 images.
Evaluate with val split.
Follow all details in this repo.
Train models with different max_iterations (60000 as default setting in this repo.)

Results in paper

selection_001

Result

model Max Iter mIoU
Resnet101-RCCA(R=2) 40000 75.85%
Resnet101-RCCA(R=2) 60000 76.81%
Resnet101-RCCA(R=2) 100000 76.36%
Resnet101-PSP 40000 76.92%
Resnet101-PSP 60000 76.85%
Resnet101-PSP 100000 76.90%

Env

pytorch 0.4.0
torchvision 0.2.1
4*TITAN XP

Is there any tricks in the implementation?

@speedinghzl
Copy link
Owner

speedinghzl commented Dec 3, 2018

@EthanZhangYi Thanks for your attention and experiments. No, this code with the default settings should achieve >79% performance on Val set. Later on, I will upload the training log and details of the experimental environment. Hope this help you to reproduce the result.

@speedinghzl
Copy link
Owner

speedinghzl commented Dec 4, 2018

Python 3.6.4
PyTorch 0.4.0
Cuda 8.0
4*TITAN XP
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)

The training log is here.
Hope it helps.

@HqWei
Copy link

HqWei commented Dec 4, 2018

image
Why is 4x2 in your log file and 3x2 in my test code?
image

@EthanZhangYi
Copy link
Author

EthanZhangYi commented Dec 4, 2018

@HqWei
Here https://github.com/speedinghzl/CCNet/blob/master/evaluate.py#L94
overlap = 1/3 and stride will be 513, not 769.
I guess your are using python2, then it should be overlap=1.0/3.0

@EthanZhangYi
Copy link
Author

EthanZhangYi commented Dec 4, 2018

@speedinghzl
Thanks for your reply.
Can you provide the model you trained? It will be useful for me to find the reason of reproduction failure.

I also find that the log you supplied does NOT match this repo. Can you retrain the model exactly with this repo?

@speedinghzl
Copy link
Owner

This repo is transplanted from the cluster version with minor changes (e.g. module name) which should not affect the performance. I will check the difference between the two version and run this repo. Thanks.

@EthanZhangYi
Copy link
Author

@speedinghzl Thanks for your reply.
Hoping you will release a model exactly trained with this repo.
Thanks

@HqWei
Copy link

HqWei commented Dec 5, 2018

@EthanZhangYi
Here https://github.com/speedinghzl/CCNet/blob/master/evaluate.py#L94
overlap = 1/3 and stride will be 513, not 769.
I guess your are using python2, then it should be overlap=1.0/3.0

This is indeed the reason. You are so great.

@lxtGH
Copy link

lxtGH commented Dec 10, 2018

@EthanZhangYi Hi! Thanks for sharing your results. Have you use ohem or aux loss during the training

@EthanZhangYi
Copy link
Author

@lxtGH I use the default setting of this repo.
The aux loss is used with a weight of 0.4. Code
OHEM is not used since it is settled to false as default. Code

@speedinghzl
Copy link
Owner

speedinghzl commented Dec 11, 2018

@EthanZhangYi @lxtGH @HqWei Hi, the trained models are available now. You can find the download links in ReadMe.

@EthanZhangYi
Copy link
Author

@speedinghzl Thanks for your work.
I will update my result later in this issue.

@mingminzhen
Copy link

@speedinghzl I try to evaluate with your uploaded model. there is some error.

	Missing key(s) in state_dict: "head.conv1a.0.weight", "head.conv1a.1.weight", "head.conv1a.1.bias", "head.conv1a.1.running_mean", "head.conv1a.1.running_var", "head.conv1b.0.weight", "head.conv1b.1.weight", "head.conv1b.1.bias", "head.conv1b.1.running_mean", "head.conv1b.1.running_var". 
	Unexpected key(s) in state_dict: "head.conva.0.weight", "head.conva.1.weight", "head.conva.1.bias", "head.conva.1.running_mean", "head.conva.1.running_var", "head.convb.0.weight", "head.convb.1.weight", "head.convb.1.bias", "head.convb.1.running_mean", "head.convb.1.running_var". 

@sydney0zq
Copy link

@EthanZhangYi @lxtGH @HqWei @mingminzhen
Hi all,
I also test the pertained models provided by @speedinghzl , and my results show they indeed achieve the performance reported in the paper.


My machine configurations:
- Python3
- Titan V, CUDA9.2, cuDNN 7.3.1
- PyTorch 0.4.1
My results:
Command: python3 evaluate.py --restore-from CS_scenes_60000-r1-77.9.pth --gpu 3 --recurrence 1
R=1, on Cityscapes validation(500 images), meanIOU: 0.779
>> {"meanIU": 0.7791398882487588, "IU_array": [0.9799122768870089, 0.8424094220308053, 0.9277934880461202, 0.5701159431855094, 0.617929964374685, 0.660018266818399, 0.7194770509831437, 0.795321347281428, 0.9269646529314325, 0.6450280156149721, 0.9472130399256627, 0.8332049467725182, 0.6567972419293739, 0.9545253561399396, 0.7739813937785753, 0.8347423192411296, 0.7033894978466324, 0.6280275186127603, 0.7868061343263207]}

Command: python3 evaluate.py --restore-from CS_scenes_60000-r2-79.7.pth --gpu 3 --recurrence 2
R=2, on Cityscapes validation(500 images), meanIOU: 0.797
>> {'meanIU': 0.7974327312413085, 'IU_array': array([0.9819191 , 0.85220207, 0.9335091 , 0.63494449, 0.64130006,0.67873142, 0.73857849, 0.82006464, 0.92934097, 0.65219507, 0.95098094, 0.84491222, 0.68060879, 0.95816941, 0.82334046, 0.85702593, 0.67736114, 0.69248874, 0.80354884])}

@EthanZhangYi
Copy link
Author

EthanZhangYi commented Dec 11, 2018

@sydney0zq @speedinghzl @lxtGH @HqWei @mingminzhen
Hi, I also updated my codes and tested the models supplied in Readme, and got the same results as that of @sydney0zq .

@mingminzhen You need to update your codes.

Now I am training the model with the latest codes, and will report my result later.

Regards

@HqWei
Copy link

HqWei commented Dec 11, 2018

@EthanZhangYi Where have you made changes to this update?

@mingminzhen
Copy link

@EthanZhangYi @speedinghzl thanks. it indeed gets the same performance by your updated code.

@speedinghzl
Copy link
Owner

You are welcome to share your reproduced performance here.

@speedinghzl speedinghzl changed the title Can NOT reproduce the result is paper Reproduce Performance Discussion Dec 12, 2018
@mingminzhen
Copy link

@speedinghzl @EthanZhangYi I use 4 Nvidia p100 to train the ccnet with default setting. The test iou on val dataset is only 78.58.

@speedinghzl
Copy link
Owner

Please refer to ReadMe. You can run multiple times to achieve a better performance. You are welcome to provide the solution to stabilize the result.

@John1231983
Copy link

Hi. Could you guess some reason make the performance is not stable? Normally, the Generative model is non stable for training but for segmentation, I think it should be stable

@EthanZhangYi
Copy link
Author

EthanZhangYi commented Dec 13, 2018

@sydney0zq @speedinghzl @lxtGH @HqWei @mingminzhen
I trained models with the latest code, and here are the results.

model ID mIoU
Resnet101-RCCA(R=2) 1 78.32%
Resnet101-RCCA(R=2) 2 78.89%
Resnet101-RCCA(R=2) 3 77.05%
Resnet101-PSP 1 77.95%
Resnet101-PSP 2 78.30%
Resnet101-PSP 3 78.47%

The result of PSPNet in paper is 78.5%, which is well reproduced.
The result of CCNet in paper is 79.8%, which is NOT reproduced.

checklist:

  • Reproduce the result of CCNet.
  • Find the reason of the repruduction failure at the very start of this issue.
  • The performance is not stable. (I guess the reason is that IoUs for classes of little samples such as bus, train are not stable, since the samples are heavily unbalanced )

Env

Python version : 3.6.3
Pytorch Version : 0.4.0
Cuda : 9.0
Cudnn : 7.0
Nccl: 2.1.15
GCC : 4.8.5
GPU : Titan XP

@John1231983
Copy link

@Ethan: the performance not stable is not from unbalance class because all methods are applied same dataset. I guess it from the attention. I used attention and got same problem. If you print the attention, you can see that almost value are become zeros, just some value goes to 0.2 or 0.3. It make the dead features. Never learning. Please print it and confirm my point

@EthanZhangYi
Copy link
Author

@John1231983
Thanks for your reply. I think attention may be one reason for the unstable results. However even for other methods such as PSPNet, in which there is no attention mechanism, the result is unstable.

@speedinghzl
Copy link
Owner

@EthanZhangYi @mingminzhen @John1231983 @sydney0zq Nice discussion!
Can you also indicate your experimental environment when reporting the reproduced performance, including Python version, Pytorch Version, Cuda, Cudnn, GCC and GPU type? Thanks!

@EthanZhangYi
Copy link
Author

@speedinghzl Thanks for the reply. I've updated the env info.

@lxtGH
Copy link

lxtGH commented Dec 25, 2018

@EthanZhangYi Id means you tried 3 times with the same default setting, I tried this code on PSPnet got the same results with your ID1. Did you use single scale crop test?

@speedinghzl
Copy link
Owner

speedinghzl commented Dec 26, 2018

Hi everybody, Someone has reproduced the performance of CCNet on Cityscapes and COCO . You can find the discussions in the issues How to use OHEM loss function? and Param Initialization. Cheers!

@lzrobots
Copy link

Hi I noticed that CCNet use 60000 iterations while PSP and Deeplabv3 (your implementation repo) use 40000 iterations. Can we say more iterations get a performance improving?

@speedinghzl
Copy link
Owner

speedinghzl commented Jan 13, 2019

In the paper, we train CCNet and PSP, deeplabv3 with the same iterations 60K.

I have released two repositories about segmentation. In the toolbox (another repo), PSP and Deeplabv3 are trained with 40k iterations. In a word, CCNet and toolbox are different repositories, please do not mix them up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants