```{contents}
```

# **Assumptions**

## 1. **Clusters are Spherical (Convex & Isotropic)**

* K-Means assumes that clusters are **round/ball-shaped** in the feature space.
* Works well when clusters are circular (2D) or spherical (multi-dimensional).
* If clusters are elongated, irregular, or non-linear (like moons or spirals), K-Means performs poorly.

👉 Example:

* ✅ Works: Height vs Weight (clusters look circular).
* ❌ Fails: Data shaped like two half-moons.

---

## 2. **Clusters are of Similar Size (Balanced Clusters)**

* K-Means assumes that each cluster has a **similar number of points**.
* If one cluster is very large and another is very small, K-Means tends to misclassify the smaller one.

👉 Example: In customer segmentation, if 95% are young customers and 5% are elderly, K-Means may “ignore” the smaller group.

---

## 3. **Equal Density**

* The algorithm assumes all clusters have **roughly the same variance (density)**.
* If one cluster is dense and another is sparse, the centroid updates can become biased.

---

## 4. **Clusters are Linearly Separable**

* K-Means implicitly assumes that clusters can be separated using straight lines (decision boundaries).
* It struggles with overlapping clusters or clusters with complex boundaries.

👉 Example: It cannot separate spiral-shaped data.

---

## 5. **Euclidean Distance is Meaningful**

* K-Means minimizes squared Euclidean distances, so it assumes that:

  * Features are on the same **scale** (standardization is important).
  * The geometry of the feature space is meaningful.
* If features are on very different scales (e.g., income in lakhs vs. age in years), results will be distorted.

---

## 6. **Number of Clusters (K) is Known**

* You must predefine **K**.
* Wrong choice of K leads to poor clustering.
* Techniques like **Elbow Method** or **Silhouette Score** help estimate K.

---

## 7. **No Major Outliers**

* Outliers can **drag centroids** away from the true cluster center.
* K-Means assumes the dataset does not contain many extreme outliers.

---

**Summary Table of Assumptions**

| Assumption          | Meaning                                   | Limitation if Violated             |
| ------------------- | ----------------------------------------- | ---------------------------------- |
| Spherical Clusters  | Clusters are round/convex                 | Poor results with irregular shapes |
| Similar Size        | Each cluster has similar number of points | Small clusters ignored             |
| Equal Density       | Same spread/variance across clusters      | Sparse clusters misclassified      |
| Linear Separability | Can be divided by straight lines          | Complex shapes not captured        |
| Euclidean Distance  | Distance is meaningful, features scaled   | Different scales distort results   |
| Known K             | Number of clusters must be predefined     | Wrong K → poor clustering          |
| No Outliers         | Data is relatively clean                  | Outliers distort centroids         |

---

In short: K-Means works **best when clusters are spherical, equal-sized, equally dense, well-separated, and features are scaled properly**.