---
Nonnegative Matrix Factorization (NMF)
---- 

![](images/nmf.png)

By The End Of This Session You Should Be Able To:
----

- Explain how NMF can model topics
- Define each of the terms in NMF
- Explain how Alternating Least Squares (ALS) Algorithm works
- Find topics in New York Times articles


What is the goal of Nonnegative Matrix Factorization (NMF)?
----

Identify clusters of items that share the same latent features.

Example
---

Which Points of Interest (POIs) are tourists interested in?

![](http://4.bp.blogspot.com/-kIp5OkLAGYg/ToCjhj_o1SI/AAAAAAAAHQI/ciEoC3nInlM/s1600/NMF.png)

- Each K is a latent feature or cluster.  
For example, "military" or "shopping".

- NMF then returns two matrices, W and H, whose product approximates the original data.  
- W has one row per tourist that tells you how much each tourist belongs to each cluster. 
- H has one column per Point of Interest (POI) which tells you how much each POI belongs to each cluster.

---
Check for understanding
---

Why are objective evaluation metrics difficult for topic modeling?

There is often no "ground-truth" for what is and what is not a topic. Thus hard to find the "correct" number of topics.

Steps in Nonnegative Matrix Factorization (NMF)
---

![](images/setup.png)

1. Data preparation / input matrix
3. Pick K (number of topics)
3. Find W and H through optimization (typically alternating least squares (ALS))

Alternating Least Squares (ALS) Algorithm
----

Setup:

1. Pick k
2. Intialize W to random `abs(randn(m,k))`

Alternating Least Squares (ALS) Algorithm
----

for i for 1:maximum_iterations

- solve $W^TWH = W^TA$ for H

- Constrain elements of H >= 0

- solve $HH^TW^T = HA^TA$ for W

- Constrain elements of W >= 0

[Source](http://meyer.math.ncsu.edu/Meyer/Talks/SAS_6_9_05_NmfWorkshop.pdf)

Alternating Least Squares (ALS) Algorithm
----

__Pros__:

- Fast (computers are _very fast_ at linear algebra
- Practical (works well with large-scale, real-world data)  
- Speedy convergence (each iteration step creates better latent factors/topics)
- Simple (only need to pick k)
- Flexible (Since all cells of W and H are free to vary it can model complex data)

Alternating Least Squares (ALS) Algorithm
----

__Cons__:
- Ad hoc nonnegativity (negative elements are set to 0)
- Ad hoc sparsity (negative elements are set to 0)
- No convergence theory (there is no guarantee the algorithm will stop or reach a global minimum)

How does Nonnegative Matrix Factorization (NMF) work for topic modeling?
----

Identify topics in documents based on words

![](images/tmn_nmf.png)

Input: Document-term matrix (A), typically with tf-idf values in cells; User-specified number of topics k.

Output: Two k-dimensional factors W and H approximating A.
    
[Source](http://www.slideshare.net/SebastianRuder/dynamic-topic-modeling-via-nonnegative-matrix-factorization-dr-derek-greene)

![](images/ex1.png)

![](images/ex2.png)

---
Check for understanding
---

Why should we remove hapaxes (words that only appear once)?

Our topic modeling would overfit these words. 

Each word would only appear in a single topic, thus be perfectly correlated with the topic. The prediction might not generalize well to the next occurrence of the word.

Summary
-----

- NMF is a quick and dirty topic modeling algorithm
- You setup $A$, pick $k$, find $W$ and $H$
- Don't forget Linear Algebra - it will help you "solve" NMF via ALS


<br>
<br>
----