# Module 1: Introduction to Scikit-Learn

## Section 4: Unsupervised Learning Algorithms

### Part 3: Non-Negative Matrix Factorization (NMF)

In this part, we will explore Non-Negative Matrix Factorization (NMF), a dimensionality reduction technique commonly used for feature extraction and topic modeling. NMF is particularly useful when dealing with non-negative data, such as text documents and images. Let's dive in!

### 3.1 Understanding Non-Negative Matrix Factorization (NMF)

Non-Negative Matrix Factorization (NMF) is a matrix factorization technique that decomposes a non-negative matrix into two non-negative matrices: W and H. For a given matrix V, NMF aims to find matrices W and H such that V â‰ˆ W * H, where W and H are non-negative.

The key idea behind NMF is to represent the data as a linear combination of non-negative basis vectors in the form of W * H. The columns of W represent the basis vectors, and the rows of H represent the coefficients for combining the basis vectors to reconstruct the original data.

### 3.2 Training and Evaluation

To apply NMF, we need a non-negative dataset represented as a matrix. The algorithm iteratively updates the values of W and H to minimize the reconstruction error between the original data and the approximation. The number of components (k) in NMF determines the dimensionality of the reduced space.

Once trained, we can use the NMF model to transform new, unseen data points into the reduced dimensional space. The transformed data points will have fewer dimensions, as we choose to keep only a subset of the components.

Scikit-Learn provides the NMF class for performing NMF. Here's an example of how to use it:

```python
from sklearn.decomposition import NMF

# Create an instance of the NMF model
n_components = 2  # Number of components (dimensions) to keep
nmf = NMF(n_components=n_components)

# Fit the model to the data and transform the data
X_nmf = nmf.fit_transform(X)

# Access the basis vectors and coefficients
basis_vectors = nmf.components_
coefficients = nmf.transform(X)

# Evaluate the model's performance (if applicable)
# - NMF is an unsupervised technique and does not have a direct evaluation metric
```

### 3.3 Choosing the Number of Components

Choosing the appropriate number of components in NMF is an important consideration. It depends on the trade-off between dimensionality reduction and the amount of information preserved. One common approach is to look at the reconstruction error or use domain knowledge to determine the optimal number of components.

### 3.4 Handling Scaling

NMF assumes non-negative data, so it is important to ensure that the input data is non-negative. If the data contains negative values, preprocessing techniques such as scaling or thresholding may be necessary.

### 3.5 Applications of NMF

NMF has various applications, including:

- Feature extraction: NMF can be used to extract meaningful features from high-dimensional datasets.
- Topic modeling: NMF can be used to discover latent topics in text documents.
- Image processing: NMF can be used for image compression and reconstruction.

### 3.6 Summary

Non-Negative Matrix Factorization (NMF) is a powerful technique for dimensionality reduction and feature extraction, particularly when dealing with non-negative data. It decomposes a matrix into non-negative basis vectors and coefficients. Scikit-Learn provides the necessary classes to implement NMF easily. Understanding the concepts, training, and evaluation techniques is crucial for effectively using NMF in practice.

In the next part, we will explore Independent Component Analysis (ICA), another popular dimensionality reduction technique.

Feel free to practice implementing NMF using Scikit-Learn. Experiment with different numbers of components, scaling techniques, and evaluation methods to gain a deeper understanding of the algorithm and its performance.