### **Unsupervised Learning**
- No labeled data; the model finds patterns on its own.

---


#### **Clustering**
- Grouping similar data points together.

   - **K-Means**:
      - Partitions data into `k` clusters.
      - Each data point is assigned to the nearest cluster center.
      - Centers are updated iteratively to minimize distance within clusters.

   - **Hierarchical Clustering**:
      - Creates a tree-like structure (dendrogram) of clusters.
      - Two approaches:
        - **Agglomerative**: Start with each point as a cluster, merge iteratively.
        - **Divisive**: Start with one cluster, split iteratively.

   - **DBSCAN (Density-Based Spatial Clustering of Applications with Noise)**:
      - Groups data points based on density.
      - Defines clusters where points are closely packed together.
      - Good for arbitrary shapes and noise-resistant.

---


### Hierarchical Clustering (Detailed Explanation)

Hierarchical clustering is a method of clustering that creates a nested sequence of clusters organized in a tree-like structure, called a **dendrogram**. This approach is well-suited for identifying hierarchical relationships in data, where clusters can be nested within each other.

![hierarch_1.gif](hierarch_1.gif) 


#### **Dendrogram**
- A dendrogram is a tree diagram that visually represents the clusters and their nested relationships.
- The leaves of the dendrogram represent individual data points, while branches indicate the merging (or splitting) of clusters.
- By "cutting" the dendrogram at a certain level, you can determine the number of clusters based on similarity.


#### **Two Main Approaches to Hierarchical Clustering**
1. **Agglomerative (Bottom-Up) Clustering**
   - Look `17-Clustering\Hierarchical Clustering\MilkDataset.ipynb` for more explanation
   - Starts with each data point as its own cluster.
   - At each step, the two closest clusters are merged based on a chosen distance measure (e.g., Euclidean distance).
   - This process repeats until all points belong to a single cluster, forming the hierarchy from individual points to larger clusters.
   
   **Process:**
   - Calculate the distances between all pairs of clusters.
   - Find the two clusters with the smallest distance and merge them.
   - Update the distance matrix to reflect the new distances.
   - Repeat until only one cluster remains.

   **Linkage Criteria (Methods for Measuring Distance Between Clusters):**
   - **Single Linkage**: Distance between the closest points of two clusters. Also known as `Minimum Distance`.  
   - **Complete Linkage**: Distance between the farthest points in two clusters. Also known as `Maximum Distance`. 
   - **Average Linkage**: Average distance between all points in the two clusters.
   - **Centroid Linkage**: Distance between the centroids (center points) of the two clusters.
   - **Ward Linkage (ANOVA)**: Minimizes the variance of the distances between clusters.

2. **Divisive (Top-Down) Clustering**
   - Begins with all data points in a single cluster.
   - At each step, the algorithm splits clusters based on dissimilarity, forming sub-clusters.
   - The splitting continues until each data point becomes its own cluster or a predefined stopping criterion is met.

   **Process:**
   - Start with a single large cluster containing all data points.
   - Find the points within the cluster that are most dissimilar to create two sub-clusters.
   - Continue to split clusters iteratively based on a chosen dissimilarity measure.

![hierarch.gif](hierarch.gif)


#### **Advantages of Hierarchical Clustering**
- Flexibility
- Hierarchical Relationships
- No Need for Number of Clusters

#### **Disadvantages of Hierarchical Clustering**
- Computational Complexity
- Sensitivity to Noise and Outliers
- No "Re-Evaluation"


![image.png](attachment:image.png)

---

### K-Means Clustering  

K-Means is a popular clustering algorithm in unsupervised learning. Its goal is to partition data into a predefined number of clusters (`k`) by minimizing the distance between data points and their assigned cluster center.

![kmeans.gif](kmeans.gif)

#### **Process of K-Means**
1. Choose `k` Clusters 

2. Initialize Cluster Centers 

3. Assign Points to Clusters 

4. Update Cluster Centers 

5. Repeat Until Convergence 

#### **Objective of K-Means**
- The algorithm aims to minimize the **within-cluster sum of squares (WCSS)**, also known as **inertia**. This is the sum of squared distances between each point and its cluster center, across all clusters.

![image.png](attachment:image.png)

#### **Advantages of K-Means**
- Simplicity and Efficiency
- Works Well with Convex Clusters

#### **Disadvantages of K-Means**
- Requires `k` Value
- Sensitive to Initial Centers
- Assumes Spherical Clusters
   

---

### **DBSCAN (Density-Based Spatial Clustering of Applications with Noise)**

DBSCAN is a density-based clustering algorithm that groups data points based on their proximity within dense regions. It’s especially useful for discovering clusters of varying shapes and is resistant to noise (outliers).

![image-2.png](dbscangif.gif)

#### **How DBSCAN Works**
1. **Define Two Parameters**:
   - **`eps` (Epsilon)**: The maximum distance between points to be considered part of the same cluster.
   - **`minPts`**: The minimum number of points required within a radius of `eps` to consider a region dense.

2. **Types of Points**:
   - **Core Point**: A point with at least `minPts` neighbors within `eps` radius.
   - **Border Point**: A point within `eps` of a core point but with fewer than `minPts` neighbors.
   - **Noise Point**: A point that isn’t within `eps` of any core point.

3. **Clustering Process**:
   - For each unvisited point:
      - If it has `minPts` neighbors within `eps`, it’s a **core point**, starting a new cluster.
      - All points within `eps` of this core point are added to the cluster.
      - The process continues recursively, checking neighbors of each core point, expanding the cluster as long as new core points are found.
      - Points that don’t meet the density threshold are labeled as noise (outliers).

![image.png](attachment:image.png)

#### **Advantages of DBSCAN**
- Handles Arbitrary Shapes
- Noise-Resistant
- No Need to Specify Number of Clusters (`k`)

#### **Disadvantages of DBSCAN**
- Sensitive to Parametersstraightforward.
- Limited for High-Dimensional Data
 

##### K-MEAN VS DBSCAN

![image.png](attachment:image.png)

---