question about batch implementation of IRM loss #7

weiHelloWorld · 2021-06-01T21:47:07Z

Hi,

Thanks for the great work! I am trying to reproduce some results and have a question regarding batch implementation of IRM loss. In Section 3.2 and Appendix D, you suggest to use following to do batch implementation:

def compute_penalty(losses, dummy_w):
    g1 = grad(losses[0::2].mean(), dummy_w, create_graph=True)[0] 
    g2 = grad(losses[1::2].mean(), dummy_w, create_graph=True)[0]
    return (g1 * g2).sum()

I am wondering whether we can do following:

def compute_penalty(losses, dummy_w):
    g = grad(losses.mean(), dummy_w, create_graph=True)[0] 
    return (g ** 2).sum()

You mentioned that the former one is "unbiased estimate of the squared gradient norm", but I am not sure why it is the case. If you can provide some explanation, that would be great.

Thank you!

igul222 · 2021-06-22T17:11:59Z

If X denotes a minibatch gradient, then E[X]^2 is the true squared grad norm (i.e. what we're trying to estimate), and E[X^2] is the "naive" minibatch estimator (i.e. your suggested code). In general, E[X^2] =/= E[X]^2, so there's a bias.

On the other hand, E[X1 * X2] = E[X1]*E[X2] when X1 and X2 are independent. Letting X1 and X2 denote different minibatches directly gives our batch-splitting estimator (Section 3.2). Hope this helps!

lopezpaz closed this as completed Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about batch implementation of IRM loss #7

question about batch implementation of IRM loss #7

weiHelloWorld commented Jun 1, 2021

igul222 commented Jun 22, 2021

question about batch implementation of IRM loss #7

question about batch implementation of IRM loss #7

Comments

weiHelloWorld commented Jun 1, 2021

igul222 commented Jun 22, 2021