# **Unsupervised Learning**

Unsupervised learning is a type of **machine learning** where the model is trained on **unlabeled data**.

* Unlike **supervised learning**, there are no predefined **output labels (y)**.
* The goal is to **find hidden patterns, structures, or relationships** within the data.

It’s like giving an algorithm a box of puzzle pieces without the picture — it tries to group or organize them in a meaningful way.

---

## Key Characteristics

1. **No labeled data** → input data $X$ only, no output $y$.
2. **Discover structure** → algorithm learns similarities, clusters, or lower-dimensional representations.
3. **Exploratory** → often used for understanding data before applying supervised methods.

---

## Types of Unsupervised Learning

### 1. **Clustering**

* Group similar data points together.
* Example algorithms:

  * **K-Means**
  * **Hierarchical Clustering**
  * **DBSCAN**
* Example use cases:

  * Customer segmentation in marketing.
  * Grouping similar documents/images.

---

### 2. **Dimensionality Reduction**

* Reduce number of features while retaining most information.
* Example algorithms:

  * **PCA (Principal Component Analysis)**
  * **t-SNE**
  * **UMAP**
* Example use cases:

  * Data visualization (project high-dimensional data into 2D/3D).
  * Speeding up training in machine learning.

---

### 3. **Association Rule Learning**

* Discover relationships or rules between variables in large datasets.
* Example algorithms:

  * **Apriori**
  * **FP-Growth**
* Example use cases:

  * Market basket analysis (“customers who buy bread also buy butter”).
  * Recommender systems.

---

### 4. **Anomaly Detection**

* Identify rare data points that deviate from the norm.
* Example algorithms:

  * **Isolation Forest**
  * **One-Class SVM**
* Example use cases:

  * Fraud detection in banking.
  * Fault detection in machines.

---

**Advantages**

✅ Can work with **unlabeled/unstructured data** (cheaper and abundant).
✅ Helps discover **hidden patterns**.
✅ Useful for **exploratory analysis** and **feature engineering**.

---

**Limitations**

❌ Harder to **evaluate accuracy** (no ground truth labels).
❌ Results may be **less interpretable**.
❌ Sensitive to **parameter choices** (e.g., number of clusters in K-Means).

---

**Example Intuition**

Imagine you own an **online store**:

* You don’t have labels like "high-value customer" vs. "low-value customer".
* Using **clustering (unsupervised learning)**, you group customers based on behavior:

  * Cluster 1: Customers who buy often but spend little.
  * Cluster 2: Customers who buy rarely but spend a lot.
  * Cluster 3: Customers who buy often and spend a lot.

This helps you design targeted marketing campaigns without needing labels.
