You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
Thanks for the great work! I am trying to reproduce some results and have a question regarding batch implementation of IRM loss. In Section 3.2 and Appendix D, you suggest to use following to do batch implementation:
You mentioned that the former one is "unbiased estimate of the squared gradient norm", but I am not sure why it is the case. If you can provide some explanation, that would be great.
Thank you!
The text was updated successfully, but these errors were encountered:
If X denotes a minibatch gradient, then E[X]^2 is the true squared grad norm (i.e. what we're trying to estimate), and E[X^2] is the "naive" minibatch estimator (i.e. your suggested code). In general, E[X^2] =/= E[X]^2, so there's a bias.
On the other hand, E[X1 * X2] = E[X1]*E[X2] when X1 and X2 are independent. Letting X1 and X2 denote different minibatches directly gives our batch-splitting estimator (Section 3.2). Hope this helps!
Hi,
Thanks for the great work! I am trying to reproduce some results and have a question regarding batch implementation of IRM loss. In Section 3.2 and Appendix D, you suggest to use following to do batch implementation:
I am wondering whether we can do following:
You mentioned that the former one is "unbiased estimate of the squared gradient norm", but I am not sure why it is the case. If you can provide some explanation, that would be great.
Thank you!
The text was updated successfully, but these errors were encountered: