New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to calculate the loss in the EM setting? #2
Comments
Hello @coallaoh @junsukchoe @naver-ai, could you please explain that to me? This is really helpful for my research. Thanks |
The final loss for CALM-EM is at Line 163 in 0ebb8a5
That is, it's in the form of
Now, let's take a look at how Your link at Line 94 in 0ebb8a5
In maths notation, this quantity is
where Combining everything together, we have the following expression:
Please note that this may be interpreted as self-supervising the pixel(z)-wise predictions p(y, z|x) with its own estimation of the cue location z for the true class y: p'(z|x, y). I understand that self-supervision has the connotation of not using any human-supplied annotation, while ours uses the GT class label. We used "self-supervision" here in the sense that the pixel-wise GT is not used and is replaced with a pseudo-pixel-wise-GT generated from a mere GT class label. |
The maps of p(z,y^|x) and p(y^|x,z) are similar because they are identical up to linear scaling. Heatmaps are usually drawn with max-normalisation which outputs the same heatmap for the ones that are identical up to linear scaling. |
Hi @coallaoh , thank you for your quick reply. From my understanding, you said that the feature_ic means the input pixel (x) at location i (or at position i) and the class label c. And in the loss function:
the target_i means the label of the pixel at location i. So my question is the target_i here is the same for all location i and it is equal to the class label c, is it right? |
No all the indices are mixed up.
No target_i means the GT class label for sample i. i is the sample index. k is the location index (and so you can write z=k etc.)
Class label c is a free index, not a designated index. One could say for example c=target_i, meaning that you set the class index c as the GT class label for sample i. |
Yes thanks @coallaoh , |
Sounds great. Thanks for your interest in our work :) |
Hi,
Thank you for your great work, it is really interesting and complicated too.
I still do not understand how you calculate the loss in the EM setting, you said in the paper that this is a self-supervised setting in the EM case, and calculates the pixel-wise nll between the pseudo-target and the joint likelihood block.
But I found in your code, you still use the image label to input to the nll loss, and here did you take the sum over two last channels which will output a specific vector and use it to compare which the image label? If this is what you have done, it will contradict what you said in the paper, right?
The second one is the attribution map of p(z,y^|x) and p(y^|x,z) here are similar, so this violate the equation p(y, z|x) = p(y|x,z)p(z|x), right? Can you also clarify on this, why these two maps are so similar?
Hope you can clarify these, they are the things I am thinking about a lot.
Best,
Tin
The text was updated successfully, but these errors were encountered: