New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
May I ask you a question about precise-bn ? #7
Comments
Very small batch size is in general not good for BatchNorm, regardless of whether percise-BN is used or not. |
Do I have a chance to achieve a good result if I use small batch size to train the model and then use large batch size to do precise bn ? |
That probably would not work. Precise BN is computing the “precise” running mean and std and it depends on the current state of weights. |
A small BN would leads to a non-optimal set of weights. |
Actually I think that you can never get a precise estimation of running_mean and running_var through precise BN unless you use all data in a single forward computation, do you guys understand what I am talking about? |
It is precise enough to give accurate final results and computing the "perfectly precise" stats do not improve results further. This is discussed in the end of Sec 3.2 as well as Appendix A3 of https://arxiv.org/pdf/2105.07576.pdf. |
Hi,
Thanks for releasing this helpful codebase. After reading the part of precise-bn, a question comes to me:
Do I need to enlarge the batch size when I do precise bn?
I noticed that the performance of bn becomes bad when the batch size is small. People say that the bad performance is due to the noisy estimation of running mean/var of the bn layers. When batch size is small, there will be more noise in each batch's mean/var which brings bad estimation for the running mean/var. So in order to cope with the problem of small batch, do I need to enlarge the batch size when I use precise bn, or is there other explanations of the bn performance drop?
The text was updated successfully, but these errors were encountered: