Skip to content
/ PTD-CCA Public

Tensor Canonical Correlation Analysis (TCCA) via penalised tensor decomposition

Notifications You must be signed in to change notification settings

htpusa/PTD-CCA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PTD-CCA: Sparse Tensor Canonical Correlation Analysis (STCCA) via penalised tensor decomposition

PTD-CCA is an unsupervised dimensionality reduction method for 2 or more views/data modalities. The algorithm is a straightforward extension of the penalised matrix decomposition CCA (PMD-CCA) proposed by Witten et al. (2009) to more than 2 views and maximises the "higher-order covariance" between the linear projections X_m*w_m where each X_m is data matrix and w_m a vector of coefficients. It reduces to PMD-CCA if there are just 2 views.

EXAMPLE

Set up some synthetic data

a = [ones(20,1); -ones(20,1); zeros(60,1)];
b = [zeros(60,1); -ones(20,1); ones(20,1)];
c = [ones(20,1); zeros(60,1); -ones(20,1)];
d = [-ones(10,1); ones(10,1); zeros(60,1); -ones(10,1); ones(10,1)];
Z = rand(100,4); Z = Z./sum(Z,2);
X1 = normrnd(Z(:,1)*a',0.1);
X2 = normrnd(Z(:,2)*b',0.1);
X3 = normrnd(Z(:,3)*c',0.1);
X4 = normrnd(Z(:,4)*d',0.1);
X = {X1;X2;X3;X4};

Run PTDCCA with "intermediate" sparsity and compare the model to the ground truth.

W = PTDCCA(X,0.5);
wtrue = [a,b,c,d];
figure
for m=1:4
    subplot(2,4,m);bar(wtrue(:,m));title(sprintf('True w_%d',m))
    xlabel('variable');ylabel('coefficient')
    subplot(2,4,4+m);bar(W{m});title(sprintf('Inferred w_%d',m))
end

Sparsity can also be set for each view separately:

c = [0.05,0.25,0.75,1];
W = PTDCCA(X,c);
for m=1:4
    subplot(1,4,m);bar(W{m});title(sprintf('c = %.2f',c(m)))
    xlabel('variable');ylabel('coefficient')
end

To calculate multiple canonical variable tuples, use the name-value input D

W = PTDCCA(X,0.5,'D',3);

If you've ran the examples above, you may have noticed the function takes some time to return. Most of the running time is in fact spent calculating the cross-covariance tensor which is used to initialise the algorithm. This can be avoided by using a random initialisation instead:

W = PTDCCA(X,0.5,'initType','random');

The covariance tensor also takes up a lot of memory, and if the dimensions of the data are high enough, might exceed the largest allowed array size. If this happens, PTDCCA defaults to the random initialisation.

References

Witten, Daniela M., Robert Tibshirani, and Trevor Hastie. "A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis." Biostatistics 10.3 (2009): 515-534.

About

Tensor Canonical Correlation Analysis (TCCA) via penalised tensor decomposition

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages