Skip to content
/ GIFT Public

GIFT: Guided and Interpretable Factorization for Tensors - Applications to Human Cancer Analytics

License

Notifications You must be signed in to change notification settings

leesael/GIFT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GIFT

Overview

Motivation: Given multi-platform genome data with prior knowledge of functional gene sets, how can we extract interpretable latent relationships between patients and genes? More specifically, how can we devise a tensor factorization method which produces an interpretable gene factor matrix based on functional gene set information while maintaining the decomposition quality and speed?

Method: We propose GIFT, a Guided and Interpretable Factorization for Tensors. GIFT provides interpretable factor matrices by encoding prior knowledge as a regularization term in its objective function.

Results: We apply GIFT to the PanCan12 dataset (TCGA multi-platform genome data) and compare the performance with P-Tucker, our baseline method without prior knowledge constraint, and Silenced-TF, our naive interpretable method. Results show that GIFT produces interpretable factorizations with high scalability and accuracy. Furthermore, we demonstrate how results of GIFT can be used to reveal significant relations between (cancer, gene sets, genes) and validate the findings based on literature evidence.

overview_img

Paper

Please use the following citation for GIFT:

Jungwoo Lee*, Sejoon Oh*, and Lee Sael (2018). GIFT: Guided and Interpretable Factorization for Tensors with an application to large-scale multi-platform cancer analysis. Bioinformatics, bty490.
[Paper] [Supplementary Material]

(* These authors contributed equally to this work.)

Code

Refer to the code directory in our repository or download the following zip file. [GIFT-v1.0]

Dataset

Name Structure Size Number of Nonzeros Download
PANCAN12 tensor Patient - Gene - Experiment Type 4,555 × 14,351 × 5 180M DOWN
Mask matrix, M(2) Gene - Gene set 14,351 × 50 7K DOWN

The mask file contains information about unmasked entries (genes in gene set).

Tested Environment

We tested our proposed method GIFT in a Linux Ubuntu 16.04.3 LTS machine equipped with an Intel Xeon E5-2630 v4 2.2GHz CPU and 512GB RAM.

About

GIFT: Guided and Interpretable Factorization for Tensors - Applications to Human Cancer Analytics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published