
# **Introduction to Unsupervised Machine Learning Algorithms**

Unsupervised Machine Learning is a type of machine learning where the model **learns patterns from unlabeled data**.
This means **no target/output variable** is provided â€” the algorithm tries to **discover structure** hidden inside the data.

---

### **Why Unsupervised Learning?**

Unsupervised learning is useful when:

* We want to **find groups or clusters** in data
* We want to **reduce dimensions** and visualize data
* We want to **detect anomalies**
* We want to **find hidden features or patterns**
* Labeled data is expensive or impossible to collect

---

### **Common Tasks in Unsupervised Learning**

**1. Clustering**

Grouping similar data points together.

* K-Means
* Hierarchical Clustering
* DBSCAN (Density-Based Spatial Clustering)
* Agglomerative Clustering

**2. Dimensionality Reduction**

Reducing features while keeping important information.

* PCA (Principal Component Analysis)
* t-SNE (t-distributed Stochastic Neighbor Embedding)
* UMAP (Uniform Manifold Approximation and Projection)

**3. Association Rule Learning**

Finding relationships between items (common in market basket analysis).

* Apriori Algorithm
* Eclat Algorithm
* FP-Growth Algorithm

**4. Anomaly Detection**

Detecting unusual data points.

* Isolation Forest
* One-Class SVM
* Local Outlier Factor (LOF)

---

### **Popular Unsupervised Learning Algorithms Explained**

**1. K-Means Clustering**

* Divides data into **K clusters**
* Works by minimizing **intra-cluster distance**
* Fast and simple
* Assumes clusters are spherical and well-separated

**Use cases:** Customer segmentation, image compression, grouping similar documents.

---

**2. Hierarchical Clustering**

* Builds a hierarchy of clusters (tree-like structure)
* Can be:

  * **Agglomerative** (bottom-up)
  * **Divisive** (top-down)

**Use cases:** Gene clustering, taxonomy, small datasets.

---

**3. DBSCAN**

* Groups points that are **densely packed**
* Can find clusters of **arbitrary shape**
* Detects **noise/outliers**

**Use cases:** Geospatial data, anomaly detection, non-linear cluster shapes.

---

**4. PCA (Principal Component Analysis)**

* Reduces dimensionality
* Creates new variables called **principal components**
* Maximizes variance
* Useful for visualization and speeding up ML models

**Use cases:** Image compression, feature extraction, noise reduction.

---

**5. t-SNE**

* Non-linear dimensionality reduction
* Mainly used for **visualization of high-dimensional data**

**Use cases:** Visualizing embeddings, NLP vectors, image features.

---

**6. Apriori Algorithm**

* Finds **frequent itemsets** and **association rules**
* Works by generating larger itemsets from smaller ones

**Use cases:** Market basket analysis (Amazon, Walmart recommendations).

---

### **When to Use Unsupervised Learning?**

Use it when:

* Data is unlabeled
* We want to explore the data
* We want to understand structure
* We want to simplify the dataset

---

### Summary (Easy to Remember)

| Task                         | Algorithm                     | Purpose                 |
| ---------------------------- | ----------------------------- | ----------------------- |
| **Clustering**               | K-Means, DBSCAN, Hierarchical | Group similar points    |
| **Dimensionality Reduction** | PCA, t-SNE, UMAP              | Reduce features         |
| **Association Rules**        | Apriori, FP-Growth            | Find item relationships |
| **Anomaly Detection**        | LOF, One-Class SVM            | Detect unusual points   |