-
Notifications
You must be signed in to change notification settings - Fork 5
Home
Kathleen Sucipto edited this page Feb 28, 2020
·
5 revisions
fastGNMF is a python module for graph-regularized non-negative matrix factorization (Cai, 2011). It uses faiss library to speed up the similarity search (i.e., p-nearest neighbor) when generating the weight matrix.
Currently, while the Euclidean update approach works fine, the divergence update approach still needs more improvements and testing.
Matrix factorization
Instead of V = W x H
, we use X = U x V
.
import fastGNMF
import numpy as np
# generate a random X given n = 100, m = 50
X = np.random.rand(100, 50)
# initialize gnmf with rank=5, p=3, lmbda=.2
# by default, it uses faiss' IndexFlat2 and runs the Euclidean update method
gnmf = fastGNMF.Gnmf(X=X, rank=5, p=3, lmbda=.2)
U, V = gnmf.factorize()
-
X
: an instance ofnumpy
array or matrix - the original matrix with dimension (n, m) -
rank
(int): the matrix factorization rank, by default = 10 -
p
(int): the number of nearest neighbors when generating the weight matrix, by default = 3 -
lmbda
(float): the regularizer parameter, by default = 0.5 -
W
: the symmetric weight matrix, an instance ofnumpy
array or matrix;p
will be ignored ifW
is provided, by default = None -
method
(str): the update method "euclidean" or "divergence", by default = "euclidean"; check the Note above -
knn_index_type
: an instance of faiss index for computing p-nearest neighbors in matrix generation ifW
is not provided, by default = IndexFlatL2 -
knn_index_args
: the arguments when creating an instance of faiss index, by default = None
If W
is not provided, knn_index_args
will be automatically defined as (n,)
, which only works for IndexFlatL2 and IndexFlatIP. Hence, ensure the appropriate argument is provided.
-
n_iter
(int): the number of iterations to be run, by default = 100 -
return_obj_values
(bool): whether to return the objective function values as a list, by default = False -
seed
(int): the seed for U and V initializations, by default = None
- Clear and better stopping criterion rather than terminating only when
n_iter
of iterations is completed - More options in initializing U and V