Skip to content
Unsupervised Machine Learning: Nonnegative Matrix Factorization + k-means clustering
HTML Jupyter Notebook Julia
Branch: master
Clone or download
Pull request Compare This branch is 5 commits ahead, 32 commits behind TensorDecompositions:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
demo
deps
examples
logo
movies
notebooks
src
test
webapp
.gitconfig
.gitignore
.travis.yml
ACKNOWLEDGMENTS.md
COPYING.md
LICENSE
Project.toml
REQUIRE
Readme.md

Readme.md

NMFk: Nonnegative Matrix Factorization using k-means clustering

nmfk

NMFk is a novel unsupervised machine learning methodology which allows for automatic identification of the optimal number of features (signals) present in the data when NMF (Nonnegative Matrix Factorization) analyses are performed. Classical NMF approaches do not allow estimation of the number of features. The number of features k is estimated through k-means clustering coupled with regularization constraints.

In addition to feature extraction, NMFk also allows for data classifications and blind predictions.

NMFk provides high-performance computing capabilities to solve problems with Shared and Distributed Arrays in parallel. The parallelization allows for utilization of multi-core / multi-processor environments. GPU and TPU accelerations are also available through existing Julia packages.

NMFk methodology and applications are discussed in the the papers and presentations listed below.

Installation

After starting Julia, execute:

import Pkg; Pkg.add("NMFk")

or

import Pkg; Pkg.develop("NMFk")

Docker

docker run --interactive --tty montyvesselinov/nmfk

Testing

Pkg.test("NMFk")

Examples

A simple problem demonstrating NMFk can be executed as follows. First, generate 3 random signals in a matrix W:

a = rand(15)
b = rand(15)
c = rand(15)
W = [a b c]

Then, mix the signals to produce a data matrix X of 5 sensors observing the mixed signals as follows:

X = [a+c*3 a*10+b b b*5+c a+b*2+c*5]

This is equivalent to generating a mixing matrix H and obtain X by multiplying W and H

H = [1 10 0 0 1; 0 1 1 5 2; 3 0 0 1 5]
X = W * H

After that execute, NMFk to estimate the number of unknown mixed signals based only on the information in X.

import NMFk
We, He, fitquality, robustness, aic, kopt = NMFk.execute(X, 2:5; save=false, method=:simple);

The execution will produce something like this:

[ Info: Results
Signals:  2 Fit:       15.489 Silhouette:    0.9980145 AIC:    -38.30184
Signals:  3 Fit: 3.452203e-07 Silhouette:    0.8540085 AIC:    -1319.743
Signals:  4 Fit: 8.503988e-07 Silhouette:   -0.5775127 AIC:    -1212.129
Signals:  5 Fit: 2.598571e-05 Silhouette:   -0.6757581 AIC:    -915.6589
[ Info: Optimal solution: 3 signals

The code returns the estimated optimal number of signals kopt which in this case as expected is equal to 3.

The code also returns estimates of matrices W and H. It can be easily verified that We[kopt] and He[kopt] are scaled versions of the original W and H matrices.

Note that the order of columns ('signals') in W and We[kopt] are not expected to match. Also note that the order of rows ('sensors') in H and He[kopt] are also not expected to match. The estimated orders will be different every time the code is executed.

For example, the matrices can be visualized using:

import Pkg; Pkg.add("Mads")
import Mads
Mads.plotseries([a b c])
Mads.plotseries(We[kopt] ./ maximum(We[kopt]))
NMFk.plotmatrix(H)
NMFk.plotmatrix(He[kopt] ./ maximum(He[kopt]))

More examples can be found the in the test, demo, and examples directories of NMFk.

Applications:

  • Climate modeling
  • Material characterization using X rays
  • Reactive mixing
  • Molecular dynamics
  • Contaminant transport
  • Induced seismicity
  • Phase separation of co-polymers
  • Oil / Gas extraction from unconventional reservoirs

Publications:

  • Vesselinov, V.V., Mudunuru, M., Karra, S., O'Malley, D., Alexandrov, B.S., Unsupervised Machine Learning Based on Non-Negative Tensor Factorization for Analyzing Reactive-Mixing, 10.1016/j.jcp.2019.05.039, Journal of Computational Physics, 2019. PDF
  • Vesselinov, V.V., Alexandrov, B.S., O'Malley, D., Nonnegative Tensor Factorization for Contaminant Source Identification, Journal of Contaminant Hydrology, 10.1016/j.jconhyd.2018.11.010, 2018. PDF
  • O'Malley, D., Vesselinov, V.V., Alexandrov, B.S., Alexandrov, L.B., Nonnegative/binary matrix factorization with a D-Wave quantum annealer, PlosOne, 10.1371/journal.pone.0206653, 2018. PDF
  • Stanev, V., Vesselinov, V.V., Kusne, A.G., Antoszewski, G., Takeuchi,I., Alexandrov, B.A., Unsupervised Phase Mapping of X-ray Diffraction Data by Nonnegative Matrix Factorization Integrated with Custom Clustering, Nature Computational Materials, 10.1038/s41524-018-0099-2, 2018. PDF
  • Iliev, F.L., Stanev, V.G., Vesselinov, V.V., Alexandrov, B.S., Nonnegative Matrix Factorization for identification of unknown number of sources emitting delayed signals PLoS ONE, 10.1371/journal.pone.0193974. 2018. PDF
  • Stanev, V.G., Iliev, F.L., Hansen, S.K., Vesselinov, V.V., Alexandrov, B.S., Identification of the release sources in advection-diffusion system by machine learning combined with Green function inverse method, Applied Mathematical Modelling, 10.1016/j.apm.2018.03.006, 2018. PDF
  • Vesselinov, V.V., O'Malley, D., Alexandrov, B.S., Contaminant source identification using semi-supervised machine learning, Journal of Contaminant Hydrology, 10.1016/j.jconhyd.2017.11.002, 2017. PDF
  • Alexandrov, B., Vesselinov, V.V., Blind source separation for groundwater level analysis based on non-negative matrix factorization, Water Resources Research, 10.1002/2013WR015037, 2014. PDF

Research papers are also available at Google Scholar, ResearchGate and Academia.edu

Presentations:

  • Vesselinov, V.V., Physics-Informed Machine Learning Methods for Data Analytics and Model Diagnostics, M3 NASA DRIVE Workshop, Los Alamos, 2019. PDF
  • Vesselinov, V.V., Unsupervised Machine Learning Methods for Feature Extraction, New Mexico Big Data & Analytics Summit, Albuquerque, 2019. PDF
  • Vesselinov, V.V., Novel Unsupervised Machine Learning Methods for Data Analytics and Model Diagnostics, Machine Learning in Solid Earth Geoscience, Santa Fe, 2019. PDF
  • Vesselinov, V.V., Novel Machine Learning Methods for Extraction of Features Characterizing Datasets and Models, AGU Fall meeting, Washington D.C., 2018. PDF
  • Vesselinov, V.V., Novel Machine Learning Methods for Extraction of Features Characterizing Complex Datasets and Models, Recent Advances in Machine Learning and Computational Methods for Geoscience, Institute for Mathematics and its Applications, University of Minnesota, 10.13140/RG.2.2.16024.03848, 2018. PDF
  • Vesselinov, V.V., Mudunuru. M., Karra, S., O'Malley, D., Alexandrov, Unsupervised Machine Learning Based on Non-negative Tensor Factorization for Analysis of Filed Data and Simulation Outputs, Computational Methods in Water Resources (CMWR), Saint-Malo, France, 10.13140/RG.2.2.27777.92005, 2018. PDF
  • O'Malley, D., Vesselinov, V.V., Alexandrov, B.S., Alexandrov, L.B., Nonnegative/binary matrix factorization with a D-Wave quantum annealer PDF
  • Vesselinov, V.V., Alexandrov, B.A, Model-free Source Identification, AGU Fall Meeting, San Francisco, CA, 2014. PDF

Presentations are also available at slideshare.net, ResearchGate and Academia.edu

Videos:

  • Progress of nonnegative matrix factorization process:
nmfk-example

Videos are also available at YouTube

Patent:

Alexandrov, B.S., Vesselinov, V.V., Alexandrov, L.B., Stanev, V., Iliev, F.L., Source identification by non-negative matrix factorization combined with semi-supervised clustering, US20180060758A1

For more information, visit monty.gitlab.io

Examples:

Installation behind a firewall

Julia uses git for package management. Add in the .gitconfig file in your home directory:

[url "https://"]
        insteadOf = git://

or execute:

git config --global url."https://".insteadOf git://

Julia uses git and curl to install packages. Set proxies:

export ftp_proxy=http://proxyout.<your_site>:8080
export rsync_proxy=http://proxyout.<your_site>:8080
export http_proxy=http://proxyout.<your_site>:8080
export https_proxy=http://proxyout.<your_site>:8080
export no_proxy=.<your_site>

For example, if you are doing this at LANL, you will need to execute the following lines in your bash command-line environment:

export ftp_proxy=http://proxyout.lanl.gov:8080
export rsync_proxy=http://proxyout.lanl.gov:8080
export http_proxy=http://proxyout.lanl.gov:8080
export https_proxy=http://proxyout.lanl.gov:8080
export no_proxy=.lanl.gov
You can’t perform that action at this time.