
### 🔹 How to Select the Value of **k** in K-Means

* The goal is to find the **optimal number of clusters**.
* We use **WCSS (Within-Cluster Sum of Squares)** = sum of squared distances between each point and its cluster centroid.

---

### 🔹 Elbow Method

1. Run K-Means for different values of $k$ (e.g., 1 to 20).
2. For each $k$, compute WCSS.
3. Plot **WCSS vs. k**.

   * When $k = 1$, WCSS is high (all points assigned to one cluster).
   * As $k$ increases, WCSS decreases because more centroids reduce distances.
4. At some point, the decrease slows down and the curve looks like an **elbow**.
5. The **elbow point** = optimal $k$.

---

### 🔹 Distance Metrics

1. **Euclidean Distance** (straight-line distance):

   $$
   d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}
   $$

   * Used when direct distance makes sense.

2. **Manhattan Distance** (grid/block distance):

   $$
   d = |x_2 - x_1| + |y_2 - y_1|
   $$

   * Used when movement is restricted to grid-like paths (e.g., city blocks).

---

### 🔹 Key Takeaways

* Use **Elbow Method** to pick $k$.
* WCSS decreases with higher $k$, but after some point the benefit is negligible.
* **Euclidean distance** is default in K-Means, but **Manhattan distance** can be used depending on data structure.




## 🔹 Cost Function in K-Means

The **objective of K-Means** is to form clusters such that **points within the same cluster are as close as possible to their cluster centroid**.

Mathematically, this is measured by the **Within-Cluster Sum of Squares (WCSS)**, also called **distortion**.

$$
J = \sum_{i=1}^{k} \sum_{x \in C_i} \| x - \mu_i \|^2
$$

where:

* $k$ = number of clusters
* $C_i$ = set of points belonging to cluster $i$
* $x$ = a data point in cluster $i$
* $\mu_i$ = centroid (mean) of cluster $i$
* $\| x - \mu_i \|^2$ = squared Euclidean distance between a point and its centroid

---

## 🔹 Intuition

* The cost function $J$ measures the **compactness of clusters**.
* Smaller $J$ → tighter, well-separated clusters.
* K-Means tries to **minimize** $J$ using an iterative process:

  1. Assign points to the nearest centroid.
  2. Update centroids as mean of assigned points.
  3. Repeat until $J$ no longer decreases (convergence).

---

## 🔹 Example (2 clusters in 2D)

Suppose we have 2 clusters:

* Cluster 1 centroid = $\mu_1$, with points $x_1, x_2, x_3$.
* Cluster 2 centroid = $\mu_2$, with points $x_4, x_5$.

Then the cost is:

$$
J = \Big( \|x_1 - \mu_1\|^2 + \|x_2 - \mu_1\|^2 + \|x_3 - \mu_1\|^2 \Big) 
+ \Big( \|x_4 - \mu_2\|^2 + \|x_5 - \mu_2\|^2 \Big)
$$

---

## 🔹 Relation to Elbow Method

* As $k$ increases, $J$ decreases (more centroids → smaller distances).
* But after some $k$, the decrease in $J$ becomes negligible → "elbow point" = best choice of clusters.

---

✅ So in short:
The **cost function in K-Means is the sum of squared distances of each point to its assigned cluster centroid**, and K-Means minimizes this cost.

