Barlow Twins loss on identical vector #16

ChanLIM · 2021-04-20T06:57:46Z

Hello, I really enjoyed reading the paper and thought about the intentions of the loss.

However, I was wondering if setting the target matrix as identity matrix is eligible.

As far as I understand, each element of cross correlation matrix is matrix multiplication on each feature element.
Barlow Twins loss aims to have correlation of 1 on the diagonal and 0(no correlation) on the non-diagonal elements.

So, if two identical representation vector were fed to the loss, I thought it should give loss of zero, but it didn't.

For the sake of simplicity, let's say we have 2 pairs of representation vectors with identical values. (that's 4 vectors)
However, when I take two identical 1d vectors for 2 data, took the batch norm and computed Barlow Twins loss with them,
I got 1 on the diagonal but not 0 on the non-diagonal elements.

Same thing goes for the case when the batch size is 1. (batch norm makes the value to be normalized to zero though)

I'm not sure how the loss will learn invariance and redundancy with the target of identity matrix, especially on the redundancy term.
Can you please elaborate on how representation vector learns within redundancy term?

Here's a simple example I tried. (I followed the code implementation)

Thank you!

sallymmx · 2021-04-20T12:45:33Z

Same question!!
When I train with different augmentation, while validate with same augmentation, the loss of training become lower but the loss of validate becomes larger and larger, much large than training loss.

This really confuse me, could someone tell about the reason?

jingli9111 · 2021-04-20T14:45:15Z

@ChanLIM
Thanks for the example. This is very insightful!
This is why our method is fundamentally different from contrastive learning like SimCLR.

So, if two identical representation vector were fed to the loss, I thought it should give loss of zero, but it didn't.
It shouldn't. It's not enough.

Our loss function wants features to be decorrelated. But in your example, they are correlated. Feature 3 can be completely determined by feature 1 and feature 2.

If you add sample 3 so that there's no redundant dimension, I believe you can get zero loss.

Let me know if you have further questions.

jingli9111 · 2021-04-20T14:47:31Z

@sallymmx
See above about the question of "same augmentations".
The validation loss going up may have different reasons. Can you specify your experiment setting?

ChanLIM · 2021-04-21T09:25:47Z

@jingli9111
Thanks for your kind explanation.
I think I got the gist of it, but I'm not sure I understood the idea completely.

But in your example, they are correlated. Feature 3 can be completely determined by feature 1 and feature 2.
The part that I'm mostly confused with is about how we determine each feature is decorrelated from each other.

As you suggested, I tried re-calculating the loss after adding an another sample(img3), but I still don't get the results I expected.
It seems to me that simply adding another pair of identical vectors to the batch is not solving the problem in the previous example.
I also tried to change the values of vectors so that the sum of the elements won't sum up to 1 (to ensure no correlation between img1, 2, 3).

Here's another example for you.

It would be great for me to understand the problem if you could provide me with an example of a batch where each features are not correlated and what it means to be not correlated.

Thanks in advance!

jingli9111 · 2021-04-22T02:52:42Z

@ChanLIM
One example:
[ 0.3927, 0.0957, 0.9147], [-0.9112, -0.0944, 0.4011], [-0.1247, 0.9909, -0.0501]
You can generate a random orthogonal matrix. It should give zero loss.
By definition, if z_a ^T @ z_b = I and z_a = z_b, then z_a is an orthogonal matrix.

Sorry my previous statement "If you add sample 3 so that there's no redundant dimension, I believe you can get zero loss." is wrong.
It cannot be completely random. Determined means correlation = 1. Only not determined is not enough.
decorrelated means correlation = 0 (sum_i x_i y_i = 0)

Let me know if you have more questions.

ChanLIM · 2021-04-22T05:46:31Z

Then, I guess the objectives of the Barlow Loss function are

1)making two representation vectors to be similar to each other (relationship between different views on the same image)
and
2)making the (batch size * representation dim sized) data matrix semi-orthogonal (relationship between different data points within the batch) at the same time.

Thanks for the kind replies. It was of great help for me.

jingli9111 closed this as completed Apr 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Barlow Twins loss on identical vector #16

Barlow Twins loss on identical vector #16

ChanLIM commented Apr 20, 2021

sallymmx commented Apr 20, 2021 •

edited

jingli9111 commented Apr 20, 2021

jingli9111 commented Apr 20, 2021

ChanLIM commented Apr 21, 2021

jingli9111 commented Apr 22, 2021

ChanLIM commented Apr 22, 2021

Barlow Twins loss on identical vector #16

Barlow Twins loss on identical vector #16

Comments

ChanLIM commented Apr 20, 2021

sallymmx commented Apr 20, 2021 • edited

jingli9111 commented Apr 20, 2021

jingli9111 commented Apr 20, 2021

ChanLIM commented Apr 21, 2021

jingli9111 commented Apr 22, 2021

ChanLIM commented Apr 22, 2021

sallymmx commented Apr 20, 2021 •

edited