
# 🟢 **K-Means Clustering Explained**

### 🔹 What is it?

* **K-Means** is an **unsupervised machine learning algorithm**.
* It groups data points into **k clusters** (where *k* is a number you choose).
* Each cluster has a **centroid** (the "center" of that cluster).
* The goal is to assign points to clusters such that **within-cluster similarity is high** and **between-cluster similarity is low**.

---

### 🔹 The Algorithm (Step-by-Step)

1. **Choose the number of clusters (k).**

   * Example: If you want to group customers into 3 categories, set `k=3`.

2. **Initialize centroids randomly.**

   * Pick *k* random points from the dataset as starting centroids.

3. **Assign each data point to the nearest centroid.**

   * Use a distance metric (commonly **Euclidean distance**) to find which centroid is closest.

4. **Update centroids.**

   * For each cluster, compute the **mean** of all points in that cluster.
   * Move the centroid to this mean position.

5. **Repeat steps 3–4 until convergence.**

   * Clusters stop changing (or changes are very small).

---

### 🔹 Example Intuition

Imagine you have 2D points that form two groups.

* First, you randomly drop two centroids.
* Each point joins whichever centroid is closer.
* Then you shift the centroids to the average of their points.
* Repeat → until centroids settle in the middle of each group.

---

### 🔹 Choosing **k** (Number of Clusters)

This is tricky! Some methods:

* **Elbow Method:** Plot error (SSE) vs. k, look for the "elbow."
* **Silhouette Score:** Measures how well points fit in their cluster vs. others.
* **Domain Knowledge:** Use real-world understanding.

---

### 🔹 Advantages

✅ Simple and fast
✅ Works well with large datasets
✅ Easy to implement

### 🔹 Limitations

❌ Must predefine k
❌ Sensitive to initialization (different runs → different results)
❌ Struggles with non-spherical clusters or different densities
❌ Affected by outliers

---

### 🔹 Mathematical Formulation

* Objective: Minimize the **within-cluster sum of squared distances**:

$$
J = \sum_{i=1}^k \sum_{x \in C_i} ||x - \mu_i||^2
$$

where:

* $C_i$ = cluster i
* $\mu_i$ = centroid of cluster i
* $x$ = data points

---

✅ **In short:**
K-Means = *Initialize centroids → Assign points → Update centroids → Repeat → Stable clusters*.


### **Dimensionality Reduction**

* **Why do it?**

  1. Prevent the *curse of dimensionality* (too many features hurt model performance).
  2. Improve model training efficiency and accuracy.
  3. Enable visualization (humans can only see up to 3D).

---

### **Feature Selection**

* Goal: Select the most important features that strongly impact the target.
* Methods:

  * Use **covariance** and **correlation** (e.g., Pearson correlation) to measure relationships between features and target.
  * Strong positive/negative correlation → feature is important.
  * Near-zero correlation → feature is unimportant and can be dropped.
* Example:

  * **House size** vs. **price** → strong correlation → keep.
  * **Fountain size** vs. **price** → weak correlation → drop.

---

### **Feature Extraction**

* Goal: Create new, informative features from existing ones (instead of dropping).
* Process: Apply transformations to combine or derive features.
* Example:

  * From **room size** + **number of rooms**, derive a new feature: **house size**, which can still predict house price effectively.
* Key point: Some information is lost, but the new feature captures the essence of the originals while reducing dimensions.

---

### **Key Distinction**

* **Feature Selection** → Choose from existing features (drop irrelevant ones).
* **Feature Extraction** → Transform existing features to create new ones.

---

👉 In practice: both are used in dimensionality reduction before applying models or visualization (e.g., PCA for feature extraction).

