GitHub - Kaversoniano/GPCCA: Generalized Probabilistic Canonical Correlation Analysis

R package: GPCCA

The GPCCA package provides functions to implement Generalized Probabilistic Canonical Correlation Analysis (GPCCA) on multi-modal dataset. GPCCA is an unsupervised method designed to perform data integration and joint dimensionality reduction on a dataset consisting of multiple modalities.

Installation

The package can be installed from GitHub with:

devtools::install_github("Kaversoniano/GPCCA")

A tutorial of GPCCA comes as a built-in vignette with the installed package:

devtools::install_github("Kaversoniano/GPCCA", build_vignettes = TRUE)

GPCCA model

GPCCA is a probabilistic model built based on the Gaussian latent variable framework, serving as an extension of the probabilistic CCA (PCCA) with several key merits:

It generalizes PCCA from two to more than two modalities.
It learns joint low-dimensional embeddings probabilistically.
It imputes missing values inherently during its parameter estimation.

GPCCA is capable of handling incomplete data with various patterns of missing values. See below for an illustration of the GPCCA model applied to a three-modality dataset. White boxes are used to indicate missing data and the missing pattern of each data modality is described as follows:

Modality 1 contains random missing values
Modality 2 is fully observed and complete
Modality 3 involves modality-wise missingness

Usage

vignette("tutorial", package = "GPCCA")

Please see the package vignette for:

Generation of example multi-modal datasets
How to fit the GPCCA model on a multi-modal dataset
How to extract the *learned latent factors*
Example downstream analysis on the *joint latent embeddings*
Visualizations of the *low-dimensional embeddings*
Model selection on the target dimension, i.e., the number of latent factors

Key functions

`GPCCAmodel`

GPCCAmodel is the primary function for users. It implements GPCCA and fit it to a multi-modal dataset, which generates the joint low-dimensional embeddings. For high-dimensional data, feature selection is strongly recommended so that only the highly informative features are included in the model fit. This helps to reduce the computational resources required and boost the computational speed.

`GPCCAselect`

GPCCAselect provides a built-in technique of model selection on target dimension (hyper-parameter). Its output gives the optimal choice(s) of the target dimension based on a given set of candidate values.

`GPCCAmodel2`

GPCCAmodel2 is another primary function for users. It implements GPCCA extension models: GPCCA-Pois and GPCCA-Bern, where GPCCA-Pois is designed for multi-modal count data and GPCCA-Bern is designed for multi-modal binary data. By contrast, the original GPCCA model is developed for general continuous data, which is implemented in a separate framework by GPCCAmodel.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
R		R
data		data
man		man
src		src
vignettes		vignettes
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
cran-comments.md		cran-comments.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R package: GPCCA

Installation

GPCCA model

Usage

Key functions

`GPCCAmodel`

`GPCCAselect`

`GPCCAmodel2`

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

R package: GPCCA

Installation

GPCCA model

Usage

Key functions

GPCCAmodel

GPCCAselect

GPCCAmodel2

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GPCCAmodel`

`GPCCAselect`

`GPCCAmodel2`

Packages