COPACAR

COPACAR is a generic collective pairwise classification model for multi-way data analysis. COPACAR combines the superiority of factorization models in large relational data domains with the classification capabilities based on using the pairwise ranking loss. In contrast to current collective relational approaches COPACAR aims to infer probabilities so that those for existing relationships are higher than for assumed-to-be-negative relationships.

Although COPACAR bears correspondence with the maximization of non-differentiable area under the ROC curve, it comes with a learning algorithm that scales well on multi-relational data.

COPACAR can make category-jumping predictions about, for example, diseases from genomic and clinical data generated far outside the molecular context.

This repository contains supplementary material for Collective pairwise classification for multi-way analysis of disease and drug data by Marinka Zitnik and Blaz Zupan.

About COPACAR

COPACAR factors a collection of relation data matrices (a tensor) X^(k), k = 1,2,...,m, such that each relation X^(k) is factored into

X^(k) = A * R^(k) * A.T,

where A is the matrix storing latent components of entities in the domain. A is shared across relations. Matrix R^(k) is the latent component interaction matrix. COPACAR optimizes for the latent model w.r.t. ranking based on pairwise classification

sum_{i,j,g,h} [ (X^(k)_ij - X^(k)_gh) * log(sigma(A_i^T * R^(k) * A_j - A_g^T * R^(k) * A_h)) + reg ].

The relations are N x N matrices. Usually, these matrices correspond to the adjacency matrices of the relational graph for a particular relation in a multi-relational data set.

Dependencies

The required dependencies to build the software are Numpy >= 1.8, SciPy >= 0.10, Joblib >= 0.8, scikit-learn >= 0.12.

Usage

Example script to classify kinships data using COPACAR:

import logging
from scipy.io.matlab import loadmat
from scipy.sparse import lil_matrix
from copacar import copacar_sgd

# Set logging to INFO to see COPACAR information
logging.basicConfig(level=logging.INFO)

# Load Matlab data and convert it to dense tensor format
T = loadmat('data/alyawarradata.mat')['Rs']
X = [lil_matrix(T[:, :, k]) for k in range(T.shape[2])]

# Decompose multi-way data using COPACAR-SGD
A, R, fit, itr, exectimes = copacar_sgd(X, 10, lambda_A=1e-1, lambda_R=1e-1, max_iter=100, n_jobs=-1)

For more examples on the usage of COPACAR, please see the examples directory.

Install

To install in your home directory, use

python setup.py install --user

To install for all users on Unix/Linux

python setup.py build
sudo python setup.py install

To install in development mode

python setup.py develop

Citing

@inproceedings{Zitnik2016,
  title     = {Collective pairwise classification for multi-way analysis of disease and drug data},
  author    = {{\v{Z}}itnik, Marinka and Zupan, Bla{\v{z}}},
  booktitle = {Pacific Symposium on Biocomputing},
  volume    = {21},
  pages     = {81-92},
  year      = {2016}
}

License

COPACAR is licensed under the GPLv2.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
copacar		copacar
data		data
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

copacar

copacar

data

data

examples

examples

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

Repository files navigation

COPACAR

About COPACAR

Dependencies

Usage

Install

Citing

License

About

Releases

Packages

Languages

License

mims-harvard/copacar

Folders and files

Latest commit

History

Repository files navigation

COPACAR

About COPACAR

Dependencies

Usage

Install

Citing

License

About

Resources

License

Stars

Watchers

Forks

Languages