Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cifar validation loss decrease than increase after learning rate change #56

Open
Sirius083 opened this issue May 6, 2019 · 4 comments

Comments

@Sirius083
Copy link

Sirius083 commented May 6, 2019

Hello, I have one question when training denseNet: the validation loss get a sharp decrease than increase after learning rate changed from 0.1 to 0.01
I trained the densenet (depth_40_k_12) on cifar100 by tensorflow implementation
https://github.com/YixuanLi/densenet-tensorflow
I just modifed the code follow your data augmentation step (subtract channel mean, then divide by std)
However the validation loss seems werid(In figure)I have following two questions

cifar100_d_40_k_12
(1) Do you met the same problem when training cifar100 dataset(or it may be some tensorflow implementation error)
(2) Did your validation loss include L2 loss part?
Since the validation error seems no problem here(25.53%, 1.1% higher than that in the paper)
Thanks in advance

@liuzhuang13
Copy link
Owner

  1. We didn't meet the same problem training on CIFAR-100. I think on SVHN similar things happened, but the increase is not as significant as this.
  2. Our validation loss does not include L2 loss.

Thanks

@liuzhuang13
Copy link
Owner

Did you use data augmentation? (Padding, cropping and flipping on input image). If you didn't, it could overfit on training set and the test loss could increase.

@Sirius083
Copy link
Author

Thanks for replying, I am using the exact same data augmentation as your code(one floating point channel mean and channel standard deviation, 4-pixel padding, flipping). I am further compare the code between pytorch and tensorflow to check the difference. Thanks a lot.

@Sirius083
Copy link
Author

Did you use data augmentation? (Padding, cropping and flipping on input image). If you didn't, it could overfit on training set and the test loss could increase.

Thanks for your reply, I finally found that the tensorflow implementation did not include L2 norm on batch normalization parameters, after I adding that, the error rate decrease 0.5%, which decreased a lot. Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants