
### **DBSCAN (Density-Based Spatial Clustering of Applications with Noise)**

**Key Idea:**
DBSCAN forms clusters based on **density of points** rather than distance to centroids (like K-Means). It groups together dense regions of data and marks sparse regions as noise (outliers).

---

### **Important Parameters**

1. **Epsilon (ε):** Radius around a point to search for neighbors.
2. **MinPts:** Minimum number of points required within ε to form a dense region.

---

### **Types of Points**

1. **Core Point:**

   * Has at least `MinPts` neighbors within ε.
2. **Border Point:**

   * Has fewer than `MinPts` neighbors but lies within ε of a **core point**.
3. **Outlier (Noise):**

   * Has too few neighbors within ε and is **not** within ε of any core point.

---

### **Clustering Process**

* Start with an unvisited point.
* If it is a **core point**, expand a cluster by including all reachable points (neighbors within ε).
* Border points are attached to the nearest cluster.
* Outliers are labeled as **noise**.

---

### **Advantages**

* Handles **noise/outliers** naturally.
* Can detect **arbitrary (non-linear) shaped clusters** (unlike K-Means).
* Doesn’t require pre-specifying the number of clusters.

---

✅ **In short:**
DBSCAN clusters dense areas of data into groups (via core points), attaches border points, and ignores outliers as noise — making it powerful for noisy and non-linear datasets.



### **DBSCAN Output & Examples – Summary**

* **DBSCAN finds non-linear separable clusters** that cannot be detected by K-Means, Gaussian Mixture Models, or EM clustering.
* **Noise points** are identified and excluded from clusters. DBSCAN handles them naturally, unlike other clustering methods.
* **Core points + border points** form the main clusters, while **outliers remain unclustered**.
* Works especially well on datasets with **non-linear shapes** (e.g., concentric circles, spiral shapes).
* Example comparison:

  * **K-Means/Hierarchical clustering** → tends to put all points (including noise) into a single cluster.
  * **DBSCAN** → correctly separates data into meaningful groups, while excluding noise.
* **Output:** Different well-separated clusters + some unassigned points (outliers).
* **Key parameters:**

  * `ε` (Epsilon / radius of neighborhood)
  * `MinPts` (minimum number of points to form a dense region)
* In practice, DBSCAN is often implemented with **scikit-learn**, where we tune ε and MinPts to get meaningful results.
* Will later explore **advantages/disadvantages** and a **hands-on sklearn example**.

---

✅ **In short:**
DBSCAN creates clusters in dense regions, handles noise well, and excels at finding complex-shaped clusters that traditional clustering methods fail to detect.

