author: Roberta De Vito, Ruggero Bellio
Fits the Multi-Study Factor Analysis model via ECM algorithm and via a Bayesian approach.
The following example illustrates how to fit a MSFA model via the ECM Algorithm, using a data set available in the Bioconductor repository (www.bioconductor.org).
Some pre-processing is required to get the data into a form suitable for the
analysis. This was already done, and the resulting data frame is saved into the
data_immune
object. The commands that were used to form it are included
in the help file for the data object.
library(MSFA)
data(data_immune)
help(data_immune)
Then we get suitable starting values for model parameters, selecting K=3 common factors and (3, 4) study-specific factors for the two studies, respectively.
start_value <- start_msfa(X_s = data_immune, k = 3, j_s = c(3, 4))
Now everything is in place for estimating the model parameters via the ECM algorithm
mle <- ecm_msfa(data_immune, start_value, trace = FALSE)
The estimated matrix of common loadings can be represented by a suitable heatmap:
library(gplots)
heatmap.2(mle$Phi,dendrogram='none', Rowv=FALSE, Colv=FALSE,trace='none', density.info="none", col=heat.colors(256))
The following example illustrates a Bayesian analysis of a MSFA model.
Although the methodology has been developed targeting the
Some pre-processing is required to get the data into a form suitable for the
analysis. This was already done, and the resulting data frame is saved into the
data_immune
object. The commands that were used to form it are included
in the help file for the data object.
library(MSFA)
data(data_immune)
help(data_immune)
We fist estimate a model with a somewhat large dimension of the various loading matrices, so we set a dimension 10 for both the common factor loadings and the study-specific loadings. In order to get reproducible results, we set the random seed.
set.seed(1971)
out10_1010 <- sp_msfa(data_immune, k = 10, j_s = c(10, 10), trace = FALSE)
We take as the estimated
p <- ncol(data_immune[[1]])
nrun <- dim(out10_1010$Phi)[3]
SigmaPhi <- SigmaLambda1 <- SigmaLambda2 <- array(0, dim=c(p, p, nrun))
for(j in 1:nrun)
{
SigmaPhi[,,j] <- tcrossprod(out10_1010$Phi[,,j])
SigmaLambda1[,,j] <- tcrossprod(out10_1010$Lambda[[1]][,,j])
SigmaLambda2[,,j] <- tcrossprod(out10_1010$Lambda[[2]][,,j])
}
SigmaPhi <- apply(SigmaPhi, c(1, 2), median)
SigmaLambda1 <- apply(SigmaLambda1, c(1, 2), median)
SigmaLambda2 <- apply(SigmaLambda2, c(1, 2), median)
Phi <- apply(out10_1010$Phi, c(1, 2), median)
Then we proceed to the choice of the number of common latent factors.
plot(sp_eigen(SigmaPhi), pch = 16)
abline(h = 0.05, col = 2)
We note that 5 factors are above the
par(mfrow=c(1, 2))
plot(sp_eigen(SigmaLambda1), pch=16)
abline(h = 0.05, col = 2)
plot(sp_eigen(SigmaLambda2), pch=16)
abline(h = 0.05, col = 2)
We end up with dimensions 3 and 4 for the study-specific factor loadings.
We post-process the estimated loading matrix by the OP procedure
Phi_OP10 <- sp_OP(out10_1010$Phi[,1:5,], itermax = 10)
For larger data size, we recommend to reduce the output level in the call to
sp_msfa
.