Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

May I ask you a question about precise-bn ? #7

Closed
CoinCheung opened this issue Oct 18, 2019 · 6 comments
Closed

May I ask you a question about precise-bn ? #7

CoinCheung opened this issue Oct 18, 2019 · 6 comments

Comments

@CoinCheung
Copy link

Hi,

Thanks for releasing this helpful codebase. After reading the part of precise-bn, a question comes to me:
Do I need to enlarge the batch size when I do precise bn?
I noticed that the performance of bn becomes bad when the batch size is small. People say that the bad performance is due to the noisy estimation of running mean/var of the bn layers. When batch size is small, there will be more noise in each batch's mean/var which brings bad estimation for the running mean/var. So in order to cope with the problem of small batch, do I need to enlarge the batch size when I use precise bn, or is there other explanations of the bn performance drop?

@ppwwyyxx
Copy link
Contributor

Very small batch size is in general not good for BatchNorm, regardless of whether percise-BN is used or not.

@CoinCheung
Copy link
Author

Do I have a chance to achieve a good result if I use small batch size to train the model and then use large batch size to do precise bn ?

@haooooooqi
Copy link
Contributor

That probably would not work. Precise BN is computing the “precise” running mean and std and it depends on the current state of weights.

@haooooooqi
Copy link
Contributor

A small BN would leads to a non-optimal set of weights.

@BluebirdStory
Copy link

Actually I think that you can never get a precise estimation of running_mean and running_var through precise BN unless you use all data in a single forward computation, do you guys understand what I am talking about?

@ppwwyyxx
Copy link
Contributor

It is precise enough to give accurate final results and computing the "perfectly precise" stats do not improve results further. This is discussed in the end of Sec 3.2 as well as Appendix A3 of https://arxiv.org/pdf/2105.07576.pdf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants