Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of resnet101 #25

Closed
kleinzcy opened this issue Sep 4, 2020 · 4 comments
Closed

Performance of resnet101 #25

kleinzcy opened this issue Sep 4, 2020 · 4 comments

Comments

@kleinzcy
Copy link

kleinzcy commented Sep 4, 2020

Hi, authors, thanks to your nice paper and code!

Recently, I retrain the resnet101 model on your code. But my result is not as good as reported in the paper. I have read the issue but did not find any helpful information.

My environment: Ubuntu16.04, CUDA10.1, pytorch1.3.0, four TITAN XP GPU

My results(NoBRS, last checkpoints, NFL):

image

results after f-BRS-B:
image

And, the training curve is strange:
image

image

The training loss is growing or constant(change slightly). Do you have any idea?

Thanks.

@kleinzcy
Copy link
Author

When I retrain the resnet101 model on GTX 1080 Ti, the result is as good as reported in the paper and even better.
image

image

The NoC metric is sensitive.

@MaitaYuki
Copy link

@kleinzcy , hope you do not mind I ask you a question since I encountered the same problem you mentioned in the beginning: during training, the training and validation loss do not change much from epoch 1 to epoch 120, fluctuate around 0.3 as shown below,

(INFO) 2021-02-09 06:50:46: Epoch 99, training loss 0.319589: 96%|########################## | 512/531 [05:30<00:12, 1.56it/s]
(INFO) 2021-02-09 06:50:51: Epoch 99, training loss 0.319661: 98%|##########################4| 520/531 [05:35<00:07, 1.56it/s]
(INFO) 2021-02-09 06:50:57: Epoch 99, training loss 0.319605: 99%|##########################8| 528/531 [05:40<00:01, 1.55it/s]
(INFO) 2021-02-09 06:50:58: Save checkpoint to experiments/sbd/r34_dh128/008_first-try/checkpoints/last_checkpoint.pth
(INFO) 2021-02-09 06:51:02: Epoch 99, validation loss: 0.339068: 7%|#6 | 12/178 [00:03<00:32, 5.03it/s]
(INFO) 2021-02-09 06:51:07: Epoch 99, validation loss: 0.341119: 21%|#####3 | 38/178 [00:08<00:27, 5.15it/s]
(INFO) 2021-02-09 06:51:12: Epoch 99, validation loss: 0.341862: 36%|########9 | 64/178 [00:13<00:22, 5.14it/s]

When you reported your result is as good as reported in the paper and even better, did you observe the loss reduced below 0.3 a lot? How did you solve the problem? Thanks.

@kleinzcy
Copy link
Author

@MaitaYuki Sorry for the late reply. I am on a long holiday.

Loss fluctuates around 0.3 because of normalized focal loss. You can look at its formulation.

@Looottch
Copy link

@kleinzcy @MaitaYuki Did you remember how many time will 120 epoch takes (and how many gpus you used)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants