Barlow Twins loss on identical vector #16
Comments
Same question!! This really confuse me, could someone tell about the reason? |
@ChanLIM
Our loss function wants features to be decorrelated. But in your example, they are correlated. Feature 3 can be completely determined by feature 1 and feature 2. If you add sample 3 so that there's no redundant dimension, I believe you can get zero loss. Let me know if you have further questions. |
@sallymmx |
@jingli9111
As you suggested, I tried re-calculating the loss after adding an another sample(img3), but I still don't get the results I expected. Here's another example for you. It would be great for me to understand the problem if you could provide me with an example of a batch where each features are not correlated and what it means to be not correlated. Thanks in advance! |
@ChanLIM Sorry my previous statement Let me know if you have more questions. |
Then, I guess the objectives of the Barlow Loss function are 1)making two representation vectors to be similar to each other (relationship between different views on the same image) Thanks for the kind replies. It was of great help for me. |
Hello, I really enjoyed reading the paper and thought about the intentions of the loss.
However, I was wondering if setting the target matrix as identity matrix is eligible.
As far as I understand, each element of cross correlation matrix is matrix multiplication on each feature element.
Barlow Twins loss aims to have correlation of 1 on the diagonal and 0(no correlation) on the non-diagonal elements.
So, if two identical representation vector were fed to the loss, I thought it should give loss of zero, but it didn't.
For the sake of simplicity, let's say we have 2 pairs of representation vectors with identical values. (that's 4 vectors)
However, when I take two identical 1d vectors for 2 data, took the batch norm and computed Barlow Twins loss with them,
I got 1 on the diagonal but not 0 on the non-diagonal elements.
Same thing goes for the case when the batch size is 1. (batch norm makes the value to be normalized to zero though)
I'm not sure how the loss will learn invariance and redundancy with the target of identity matrix, especially on the redundancy term.
Can you please elaborate on how representation vector learns within redundancy term?
Here's a simple example I tried. (I followed the code implementation)
Thank you!
The text was updated successfully, but these errors were encountered: