why the performance of CCA and MCCA is different when I use them to project two views? #162

Umaruchain · 2023-02-09T05:14:57Z

Dear author,
I really appreciate your great work, and I am confused why the performance is different when I use CCA and MCCA.

jameschapman19 · 2023-02-09T08:34:17Z

Do you have a code snippet I can use to reproduce? Otherwise will explore myself and get back to you

Umaruchain · 2023-02-09T09:04:09Z

from cca_zoo.models import CCA,MCCA
import pdb
import numpy as np
import torch
from random import sample
import random
import os

def Initialize_Seed(seed=2):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.benchmark=False
os.environ["PYTHONHASHSEED"]= str(seed)

def Ori_CCA_fit_transform(multi_view,latent_dim=100,epochs=20):

Initialize_Seed(2)
linear_cca = CCA(latent_dims=latent_dim,random_state=2)
linear_cca.fit(multi_view)

res = linear_cca.transform(multi_view)
return res

def CCA_fit_transform(multi_view,latent_dim=100,epochs=20):

Initialize_Seed(2)
linear_cca = MCCA(latent_dims=latent_dim,random_state=2)


linear_cca.fit(multi_view)


res = linear_cca.transform(multi_view)

return res

Initialize_Seed(2)
x1 = np.random.normal(size=(20,5))
x2 = np.random.normal(size=(20,5))

x= [x1,x2]
res1 = CCA_fit_transform(x,latent_dim=5)
res2 = Ori_CCA_fit_transform(x,latent_dim=5)

print((res1[0]==res2[0]).all())

Umaruchain · 2023-02-09T09:05:08Z

The result of the print is False. Please try it.

jameschapman19 · 2023-02-09T09:31:18Z

Ok the weights of these models are the same but they are normalised differently. eg if you check the correlation of res1[0] and res2[0] you will find they are +1/-1.

I’ll have a look if I can tweak the MCCA eigenvalue problem to make them have the same solution in any case but the results of your code snippet aren’t too worrying in the sense that the normalisation is arbitrary

Umaruchain · 2023-02-09T11:03:10Z

Many thanks~
By the way, I am also a little confused about the loss of deep cca. Could you please explain it to me?
In my opinion, the diag of C should be the correlation of different views while the diag of all_views.T @ all_views / (n - 1) is the correlation of itself. For example, in your code, C[0][0] is the correlation of views[0][:,0] and view[0][:,0], but what we actually need is the correlation of view[0][:,0] and view[1][:,0].

     n = views[0].shape[0]
    # Subtract the mean from each output
    views = _demean(views)

    # Concatenate all views and from this get the cross-covariance matrix
    all_views = torch.cat(views, dim=1)
    C = all_views.T @ all_views / (n - 1)
    #pdb.set_trace()

    # Get the block covariance matrix placing Xi^TX_i on the diagonal
    D = torch.block_diag(
        *[
            (1 - r) * m.T @ m / (n - 1)
            + r * torch.eye(m.shape[1], device=m.device)
            for i, m in enumerate(views)
        ]
    )
   # pdb.set_trace()
    C = C - torch.block_diag(*[view.T @ view / (n - 1) for view in views]) + D

    R = _mat_pow(D, -0.5, 1e-3)

    # In MCCA our eigenvalue problem Cv = lambda Dv
    C_whitened = R @ C @ R.T

    eigvals = torch.linalg.eigvalsh(C_whitened)

jameschapman19 · 2023-02-09T11:20:27Z

Notice that the loss is based on C_whitened not C so while your observation that C[0][0] is the correlation of views[0][:,0] and view[0][:,0] is correct it isn't relevant to the eigenvalues of C_whitened.

In fact C_whitened[0][0] should always be 1 (subject to imprecision) and indeed the whole diagonal should be 1.

The reason why we use this form for MCCA is because if you imagine the covariance matrix of all of the views we care about all of the pairwise covariances.

jameschapman19 · 2023-02-09T11:23:55Z

https://arxiv.org/pdf/2005.11914.pdf equation 4 of this is I think the best reference for MCCA.

Umaruchain · 2023-02-09T12:10:05Z

Thank you so much!!!

jameschapman19 · 2023-02-09T13:43:02Z

In the next update I have adjusted the eigenvalue problems so that these will both produce the same result. In particular the norm of the transformed training data should be sqrt(n) where n is the number of samples.

Umaruchain closed this as completed Feb 9, 2023

jameschapman19 mentioned this issue Feb 10, 2023

GRCCA gives quite different results with original implementation #163

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why the performance of CCA and MCCA is different when I use them to project two views? #162

why the performance of CCA and MCCA is different when I use them to project two views? #162

Umaruchain commented Feb 9, 2023

jameschapman19 commented Feb 9, 2023

Umaruchain commented Feb 9, 2023

Umaruchain commented Feb 9, 2023

jameschapman19 commented Feb 9, 2023

Umaruchain commented Feb 9, 2023

jameschapman19 commented Feb 9, 2023

jameschapman19 commented Feb 9, 2023

Umaruchain commented Feb 9, 2023

jameschapman19 commented Feb 9, 2023

why the performance of CCA and MCCA is different when I use them to project two views? #162

why the performance of CCA and MCCA is different when I use them to project two views? #162

Comments

Umaruchain commented Feb 9, 2023

jameschapman19 commented Feb 9, 2023

Umaruchain commented Feb 9, 2023

Umaruchain commented Feb 9, 2023

jameschapman19 commented Feb 9, 2023

Umaruchain commented Feb 9, 2023

jameschapman19 commented Feb 9, 2023

jameschapman19 commented Feb 9, 2023

Umaruchain commented Feb 9, 2023

jameschapman19 commented Feb 9, 2023