Skip to content
Kathleen Sucipto edited this page Feb 28, 2020 · 5 revisions

fastGNMF is a python module for graph-regularized non-negative matrix factorization (Cai, 2011). It uses faiss library to speed up the similarity search (i.e., p-nearest neighbor) when generating the weight matrix.

Note

Currently, while the Euclidean update approach works fine, the divergence update approach still needs more improvements and testing.

Naming Convention

Matrix factorization

image

Instead of V = W x H, we use X = U x V.

Example

import fastGNMF
import numpy as np

# generate a random X given n = 100, m = 50
X = np.random.rand(100, 50)

# initialize gnmf with rank=5, p=3, lmbda=.2
#   by default, it uses faiss' IndexFlat2 and runs the Euclidean update method
gnmf = fastGNMF.Gnmf(X=X, rank=5, p=3, lmbda=.2)
U, V = gnmf.factorize()

Parameters

Gnmf(): initializing the class instance

  • X: an instance of numpy array or matrix - the original matrix with dimension (n, m)
  • rank (int): the matrix factorization rank, by default = 10
  • p (int): the number of nearest neighbors when generating the weight matrix, by default = 3
  • lmbda (float): the regularizer parameter, by default = 0.5
  • W: the symmetric weight matrix, an instance of numpy array or matrix; p will be ignored if W is provided, by default = None
  • method (str): the update method "euclidean" or "divergence", by default = "euclidean"; check the Note above
  • knn_index_type: an instance of faiss index for computing p-nearest neighbors in matrix generation if W is not provided, by default = IndexFlatL2
  • knn_index_args: the arguments when creating an instance of faiss index, by default = None

If W is not provided, knn_index_args will be automatically defined as (n,), which only works for IndexFlatL2 and IndexFlatIP. Hence, ensure the appropriate argument is provided.

factorize()

  • n_iter (int): the number of iterations to be run, by default = 100
  • return_obj_values (bool): whether to return the objective function values as a list, by default = False
  • seed (int): the seed for U and V initializations, by default = None

Potential future enhancements

  • Clear and better stopping criterion rather than terminating only when n_iter of iterations is completed
  • More options in initializing U and V