  

---

# **📌 K-Means Clustering for Beginners**  
K-Means is a **clustering algorithm** used to group similar points together.  
It works in the following steps:  

1. **Pick K** (number of clusters)  
2. **Randomly place K centroids** (initial cluster centers)  
3. **Assign each data point** to the nearest centroid  
4. **Move the centroids** to the average position of assigned points  
5. **Repeat steps 3 & 4** until centroids don't change  

Let's start! 🚀  

---

### **🔹 Step 1: Import Libraries**
We need a few basic Python libraries for working with data and visualization.  
```python
import numpy as np  # 📊 For handling numbers
import matplotlib.pyplot as plt  # 🎨 For plotting graphs
from sklearn.cluster import KMeans  # 🔢 K-Means algorithm
from sklearn.datasets import make_blobs  # 🎯 To generate sample data
```
---

### **🔹 Step 2: Create a Small Dataset**
Instead of using real data, we will create **20 random points** with two natural groups.  

```python
# 🎯 Create a small dataset with 20 points and 2 clusters
X, _ = make_blobs(n_samples=20, centers=2, cluster_std=1.0, random_state=42)

# 📊 Plot the dataset
plt.scatter(X[:, 0], X[:, 1], s=100, color='blue', edgecolors='black')
plt.title("Dataset for K-Means Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
```
✅ **Output:** A graph with **20 blue dots**, showing our dataset.  

---

### **🔹 Step 3: Apply K-Means Clustering**
Now, let's apply **K-Means** to group our data into **2 clusters** (K=2).  

```python
# 🔹 Create K-Means model with K=2
kmeans = KMeans(n_clusters=2, random_state=42)

# 🔹 Train the model on our dataset
kmeans.fit(X)

# 🔹 Get cluster labels (which point belongs to which group)
clusters = kmeans.predict(X)

# 📌 Get centroid positions
centroids = kmeans.cluster_centers_

# 📊 Plot clustered data
plt.scatter(X[clusters == 0, 0], X[clusters == 0, 1], color='red', s=100, label="Cluster 1")
plt.scatter(X[clusters == 1, 0], X[clusters == 1, 1], color='green', s=100, label="Cluster 2")

# 🎯 Plot centroids
plt.scatter(centroids[:, 0], centroids[:, 1], color='yellow', s=300, marker='X', label="Centroids")

plt.title("K-Means Clustering (K=2)")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.show()
```
✅ **Output:** A graph with **two groups** (red & green) and **yellow centroids**.  

---

### **🔹 Step 4: Finding the Best K (Elbow Method)**
The **Elbow Method** helps us find the best number of clusters (**K**).  
It checks how well different values of **K** fit the data.  

```python
wcss = []  # 📌 Store WCSS (Within-Cluster Sum of Squares) for each K

# 🔹 Try different values of K (from 1 to 6)
for k in range(1, 7):
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(X)  # Train K-Means
    wcss.append(kmeans.inertia_)  # Store WCSS value

# 📊 Plot WCSS vs. K
plt.plot(range(1, 7), wcss, marker='o', linestyle='-')
plt.xlabel("Number of Clusters (K)")
plt.ylabel("WCSS")
plt.title("Elbow Method for Optimal K")
plt.show()
```
✅ **Output:** A graph showing how WCSS decreases as **K increases**.  
The "elbow" in the curve suggests the best **K value**.  

---

### **🔹 Step 5: Fixing the Random Initialization Trap**
**Problem:** If centroids are placed badly at the start, the algorithm may get stuck in a bad solution.  
**Solution:** Use **K-Means++** (smarter centroid selection).  

```python
# 🔹 Run K-Means++ instead of random initialization
kmeans_plus = KMeans(n_clusters=2, init='k-means++', random_state=42)
clusters_plus = kmeans_plus.fit_predict(X)

# 🎯 Plot clusters using K-Means++
plt.scatter(X[clusters_plus == 0, 0], X[clusters_plus == 0, 1], color='red', s=100, label="Cluster 1")
plt.scatter(X[clusters_plus == 1, 0], X[clusters_plus == 1, 1], color='green', s=100, label="Cluster 2")

# 🎯 Plot centroids
plt.scatter(kmeans_plus.cluster_centers_[:, 0], kmeans_plus.cluster_centers_[:, 1], color='yellow', s=300, marker='X', label="Centroids (K-Means++)")

plt.title("K-Means++ Initialization (Better Clustering)")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.show()
```
✅ **Output:** Better clustering with correctly placed centroids!  

---

### **✅ Summary**
🔹 **Step 1:** Created a dataset  
🔹 **Step 2:** Applied **K-Means with K=2**  
🔹 **Step 3:** Used the **Elbow Method** to find the best **K**  
🔹 **Step 4:** Fixed **random initialization trap** using **K-Means++**  

 🚀

![image.png](attachment:image.png)



![image.png](attachment:image.png)

k=2

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)