##Introduction

This is a demo of non-negative matrix factorization. It showcases how the NMF works on a random dataset and aims to enhance the understanding of the method.

First imports:

In [1]:
import numpy as np
import pandas as pd
from sklearn.decomposition import NMF

In [2]:
orig = np.random.randint(low = 0, high = 10, size = 15).reshape(3, 5)
orig = pd.DataFrame(orig, index = ['A', 'B', 'C'], columns = ['Feature 1', 'Feature 2', 'Feature 3', 'Feature 4', 'Feature 5'])
print(orig)

   Feature 1  Feature 2  Feature 3  Feature 4  Feature 5
A          9          9          6          7          8
B          9          4          3          6          6
C          1          7          6          0          1


Exemplary dataset contains 3 samples. Each sample has 5 distinct features.

In [3]:
nmf = NMF(n_components = 2)
transformed = nmf.fit_transform(orig)
print('Transformed are:\n {}'.format(transformed))
components = nmf.components_
print('Components are:\n{}'.format(components))

Transformed are:
 [[3.31133175 1.99597943]
 [3.05818389 0.        ]
 [0.06796699 3.36271342]]
Components are:
[[2.77203532 1.38736509 0.87307418 2.04351797 2.10051156]
 [0.15509109 2.09368364 1.71217267 0.         0.32484478]]


###Components

The components contain information about the original features. Each component represents some features. Let us annotate the component and look at them.

In [4]:
comp_df = pd.DataFrame(components, columns = orig.columns, index = ['Component 1', 'Component 2'])
print(comp_df)

             Feature 1  Feature 2  Feature 3  Feature 4  Feature 5
Component 1   2.772035   1.387365   0.873074   2.043518   2.100512
Component 2   0.155091   2.093684   1.712173   0.000000   0.324845


First component mainly tells about feature 1 and feature 4. Second component highlights feature 2 and feature 3.

###Transformed samples

Each row of transformed matrix relates to one samples in the original data and columns determin the amount by which the sample is influenced by each component. Let us annotate them and looke at them.

In [5]:
transf_df = pd.DataFrame(transformed, index = orig.index, columns = comp_df.index)
print(transf_df)

   Component 1  Component 2
A     3.311332     1.995979
B     3.058184     0.000000
C     0.067967     3.362713


Transformed data can be used further down in the pipeline for machine learning, etc.

###Relationship between transformed, components and the original

The idea is Transformed X Components = original data.

Let us check that fact. First we calculate the dot product of the two matrices. 

In [6]:
prod = np.dot(transformed, components)
print('The dot product is: \n {}'.format(prod))
print('The shape of the product is: {}'.format(prod.shape))

The dot product is: 
 [[9.48868719 8.77297555 6.3084997  6.76676592 7.60387411]
 [8.47739376 4.24281756 2.6700214  6.24945372 6.42375062]
 [0.70993378 7.13475312 5.81688625 0.13889176 1.23512534]]
The shape of the product is: (3, 5)


Size of the product is the same as the original data. Let us look, whether the values are similar compared to the original. We calculate the difference.

In [7]:
print('The matrix of differences is:\n{}'.format(orig.values - prod))

The matrix of differences is:
[[-0.48868719  0.22702445 -0.3084997   0.23323408  0.39612589]
 [ 0.52260624 -0.24281756  0.3299786  -0.24945372 -0.42375062]
 [ 0.29006622 -0.13475312  0.18311375 -0.13889176 -0.23512534]]


Seems that NMF wasn't far off. NMF is cute to interpret. I wonder how to visualize it.

TO-DO:
- [ ] Proof-read
- [ ] Add visualization
- [ ] Publish somewhere