# K-Means Clustering Algorithm

In this tutorial, we will understand the **geometric intuition** behind the **K-Means clustering algorithm**, a popular **unsupervised machine learning algorithm**.

---

## Geometric Intuition

Let's consider we have some data points in **2D space** (x and y axes).  

By visually inspecting the data, we might see that there are **two groups of points**:



Cluster 1: ● ● ●
Cluster 2: ● ● ●


After applying K-Means clustering, the points will be grouped into clusters:



Cluster 1: ● ● ● (Centroid: ⊙)
Cluster 2: ● ● ● (Centroid: ⊙)


Similarly, if the data has **three groups**, K-Means will output three clusters with three centroids.

**Key Idea:** K-Means clusters **similar points together**.

---

## Steps in K-Means Clustering

### Step 1: Initialize Centroids

We start by selecting **k centroids**, where `k` is the number of clusters we want.  

- Centroids can be **randomly initialized**.
- For example, if `k=2`:



Centroid 1: ⊙
Centroid 2: ⊙


> How to choose `k` will be discussed later.

---

### Step 2: Assign Points to Nearest Centroid

For each data point, we calculate its **distance** to all centroids and assign it to the nearest one.

- **Distance metrics** can be:
  - **Euclidean distance**
  - **Manhattan distance**

$$
\text{Euclidean Distance between } P(x_1, y_1) \text{ and } Q(x_2, y_2) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}
$$

$$
\text{Manhattan Distance} = |x_2 - x_1| + |y_2 - y_1|
$$

Points assigned to centroids are marked with the **centroid's color**.

---

### Step 3: Move Centroids

Once points are assigned:

1. Compute the **average of all points** in each cluster.
2. Update the **centroid location** to this average.

$$
\text{New Centroid} = \frac{\sum_{i=1}^{n} P_i}{n}
$$

Repeat **Step 2** and **Step 3** until:

- **Centroids stop moving** OR
- **Points no longer change clusters**

At convergence, we get the **final clusters**.

---

### Example of Iteration

- **Iteration 1:**
  - Centroids randomly initialized.
  - Points assigned to nearest centroid.

- **Iteration 2:**
  - Compute average of points → move centroids.
  - Reassign points to nearest centroid.

- **Iteration 3 (Convergence):**
  - Centroids stabilize.
  - Final clusters obtained.

---

## Summary of K-Means Steps

1. Initialize `k` centroids (randomly).
2. Assign each point to its nearest centroid.
3. Update centroids by computing the average of points in the cluster.
4. Repeat Steps 2 and 3 until convergence.

> Once points stop changing clusters, the algorithm **stops**.

---

## Next Topic

**How to select the `k` value** effectively.



Question: How do we select the k value?


We will discuss this in the next video.

---

**Key Takeaways:**

- K-Means clusters similar points together.
- Centroids represent the **center of each cluster**.
- Distance metrics like **Euclidean** or **Manhattan** are used.
- Iterative process: Assign → Update → Repeat → Converge.

# Selecting the k Value in K-Means Clustering

In the previous discussion, we applied K-Means clustering with a known `k` value (e.g., `k=2`).  
In real-world scenarios, **overlapping clusters** make it hard to choose `k`. Here, we discuss how to select `k`.

---

## Within-Cluster Sum of Squares (WCSS)

We introduce a key metric: **Within-Cluster Sum of Squares (WCSS)**.

$$
\text{WCSS} = \sum_{i=1}^{n} \| x_i - \mu_{c(i)} \|^2
$$

Where:

- \(x_i\) = data point
- \(\mu_{c(i)}\) = centroid of the cluster to which \(x_i\) belongs
- \(n\) = total number of points

**Intuition:** WCSS measures **compactness** of clusters. Smaller WCSS → points are closer to centroids.

---

## Step 1: Compute WCSS for Different k Values

1. Initialize `k` from 1 to some maximum (e.g., 20).
2. For each `k`:
   - Run K-Means clustering.
   - Compute WCSS.

**Example:**

- **k = 1** → WCSS is high (all points in one cluster).
- **k = 2** → WCSS decreases (points split into 2 clusters).
- **k = 3, 4, …** → WCSS continues decreasing, eventually stabilizing.

---

## Step 2: Use the Elbow Method

Plot **WCSS vs k**.  

- The plot usually looks like an **arm**.
- The point where **WCSS abruptly decreases** and then stabilizes is called the **elbow**.
- The corresponding `k` at the elbow is selected as the optimal number of clusters.

**Elbow Method Illustration:**




> The "elbow" indicates diminishing returns in reducing WCSS. Select `k` at the elbow.

---

## Distance Metrics in K-Means

### 1. Euclidean Distance

Used to compute the **straight-line distance** between two points.

For 2D points \(P_1=(x_1, y_1)\) and \(P_2=(x_2, y_2)\):

$$
d_{Euclidean} = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}
$$

For 3D points, extend to include the z-coordinate:

$$
d_{Euclidean} = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2}
$$

---

### 2. Manhattan Distance

Used when movement is restricted to **grid-like paths** (e.g., city streets).

For 2D points:

$$
d_{Manhattan} = |x_2 - x_1| + |y_2 - y_1|
$$`

**Intuition:** Travel along axes rather than straight line.

**Example:**  
- Grid-like city → use Manhattan distance.  
- Open plane → use Euclidean distance.

---

## Practical Insights

- Use **Elbow Method** to determine `k`.
- Use **Euclidean distance** when points can move freely.
- Use **Manhattan distance** when movement is along grid lines.
- Initialization matters: Randomly placing centroids **too close** may cause poor clustering. Techniques like **k-means++** help with better initialization (to be discussed in the next video).

---

**Key Takeaways:**

1. **WCSS** measures cluster compactness.
2. **Elbow Method** helps find the optimal `k`.
3. **Euclidean vs Manhattan distance** depends on geometry and constraints.
4. Proper initialization improves clustering quality.
