Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why the performance of CCA and MCCA is different when I use them to project two views? #162

Closed
Umaruchain opened this issue Feb 9, 2023 · 9 comments

Comments

@Umaruchain
Copy link

Dear author,
I really appreciate your great work, and I am confused why the performance is different when I use CCA and MCCA.

@jameschapman19
Copy link
Owner

Do you have a code snippet I can use to reproduce? Otherwise will explore myself and get back to you

@Umaruchain
Copy link
Author

from cca_zoo.models import CCA,MCCA
import pdb
import numpy as np
import torch
from random import sample
import random
import os

def Initialize_Seed(seed=2):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.benchmark=False
os.environ["PYTHONHASHSEED"]= str(seed)

def Ori_CCA_fit_transform(multi_view,latent_dim=100,epochs=20):

Initialize_Seed(2)
linear_cca = CCA(latent_dims=latent_dim,random_state=2)
linear_cca.fit(multi_view)

res = linear_cca.transform(multi_view)
return res

def CCA_fit_transform(multi_view,latent_dim=100,epochs=20):

Initialize_Seed(2)
linear_cca = MCCA(latent_dims=latent_dim,random_state=2)


linear_cca.fit(multi_view)


res = linear_cca.transform(multi_view)

return res

Initialize_Seed(2)
x1 = np.random.normal(size=(20,5))
x2 = np.random.normal(size=(20,5))

x= [x1,x2]
res1 = CCA_fit_transform(x,latent_dim=5)
res2 = Ori_CCA_fit_transform(x,latent_dim=5)

print((res1[0]==res2[0]).all())

@Umaruchain
Copy link
Author

The result of the print is False. Please try it.

@jameschapman19
Copy link
Owner

Ok the weights of these models are the same but they are normalised differently. eg if you check the correlation of res1[0] and res2[0] you will find they are +1/-1.

I’ll have a look if I can tweak the MCCA eigenvalue problem to make them have the same solution in any case but the results of your code snippet aren’t too worrying in the sense that the normalisation is arbitrary

@Umaruchain
Copy link
Author

Many thanks~
By the way, I am also a little confused about the loss of deep cca. Could you please explain it to me?
In my opinion, the diag of C should be the correlation of different views while the diag of all_views.T @ all_views / (n - 1) is the correlation of itself. For example, in your code, C[0][0] is the correlation of views[0][:,0] and view[0][:,0], but what we actually need is the correlation of view[0][:,0] and view[1][:,0].

     n = views[0].shape[0]
    # Subtract the mean from each output
    views = _demean(views)

    # Concatenate all views and from this get the cross-covariance matrix
    all_views = torch.cat(views, dim=1)
    C = all_views.T @ all_views / (n - 1)
    #pdb.set_trace()

    # Get the block covariance matrix placing Xi^TX_i on the diagonal
    D = torch.block_diag(
        *[
            (1 - r) * m.T @ m / (n - 1)
            + r * torch.eye(m.shape[1], device=m.device)
            for i, m in enumerate(views)
        ]
    )
   # pdb.set_trace()
    C = C - torch.block_diag(*[view.T @ view / (n - 1) for view in views]) + D

    R = _mat_pow(D, -0.5, 1e-3)

    # In MCCA our eigenvalue problem Cv = lambda Dv
    C_whitened = R @ C @ R.T

    eigvals = torch.linalg.eigvalsh(C_whitened)

@jameschapman19
Copy link
Owner

Notice that the loss is based on C_whitened not C so while your observation that C[0][0] is the correlation of views[0][:,0] and view[0][:,0] is correct it isn't relevant to the eigenvalues of C_whitened.

In fact C_whitened[0][0] should always be 1 (subject to imprecision) and indeed the whole diagonal should be 1.

The reason why we use this form for MCCA is because if you imagine the covariance matrix of all of the views we care about all of the pairwise covariances.

@jameschapman19
Copy link
Owner

https://arxiv.org/pdf/2005.11914.pdf equation 4 of this is I think the best reference for MCCA.

@Umaruchain
Copy link
Author

Thank you so much!!!

@jameschapman19
Copy link
Owner

In the next update I have adjusted the eigenvalue problems so that these will both produce the same result. In particular the norm of the transformed training data should be sqrt(n) where n is the number of samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants