Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about DiffLoss #2

Closed
kmaninis opened this issue Sep 18, 2018 · 3 comments
Closed

Question about DiffLoss #2

kmaninis opened this issue Sep 18, 2018 · 3 comments

Comments

@kmaninis
Copy link

Hi, thanks for the nice PyTorch implementation.

I have some questions for the DiffLoss:

  • Shouldn't this line compute the correlation of each feature dimension of the private features to each feature dimension of the shared features? As it is now, it is computing the correlation of one sample to another. Correct me if I'm wrong, but shouldn't it be:
    torch.mean((input1_l2.t().mm(input2_l2).pow(2))) instead?

  • Also, you mention that there are some stability issues. Could that be because there is no mean value normalization, as done in the TF implementation of the authors?

Thanks a lot :)

@fungtion
Copy link
Owner

@kmaninis Thank you for your feedback.

  • As far as the first question, each row of input1 and input2 in DiffLoss represents the private and shared feature of the same sample, respectively. The DiffLoss is supposed to make the private and shared features orthogonal. So I think it should be torch.mean((input1_l2.mm(input2_l2.t()).pow(2))).
  • For the second question, I don't think mean value normalization could improve the stability significantly, since the the proposed DiffLoss has brought too much instability.

@kmaninis
Copy link
Author

@fungtion Thanks for your reply.

I think that the orthogonality constraint in your case doesn't hold in case you permute your features, for example [0, 1] and [1, 0] are orthogonal, but you can get the one from the other by just permuting the features.

In short, I think that what it is meant to do is: For a private feature of dimension M x C1 and a shared representation M x C2 (M stands for batch size, C stands for features) create a correlation C1 X C2 and minimize it. Isn't it what is happening in this part of the code?

@fungtion
Copy link
Owner

@kmaninis Yes, I see what you mean, this is probably my misunderstanding about the paper, which makes it unstable. I'll try to modify my implementation. Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants