
# In-Class Activity (30 min): Color Quantization with Clustering

**Objective:** Recreate an image using only **k** colors by clustering pixels.

You will **write the code** for each step (only the image upload cell is provided).  
Work in pairs if helpful. Ask questions early!

**Try these methods on the SAME image:**
- `KMeans` (**k-means++** init)
- `KMeans` (**random** init)
- `MiniBatchKMeans`
- `SpectralClustering` (on a **downsized** image)



## 0) Upload an image (PNG/JPG)

Use the cell below to upload an image from your computer.  
Large images will be resized to keep the activity fast.


In [None]:

#Upload + load image (NO URL OPTION)
from google.colab import files
from io import BytesIO
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

print("Choose an image file (PNG/JPG)...")
uploaded = files.upload()
fname = next(iter(uploaded))
img = Image.open(BytesIO(uploaded[fname])).convert("RGB")

# Resize very large images for speed
max_side = 700
w, h = img.size
scale = min(1.0, max_side / max(w, h))
if scale < 1.0:
    img = img.resize((int(w*scale), int(h*scale)), Image.LANCZOS)

img_np = np.array(img)
plt.imshow(img_np); plt.axis('off'); plt.title("Loaded Image")
plt.show()

print("Image shape:", img_np.shape)  # (H, W, 3)



## 1) Convert image → pixel matrix

**Do this**
- Create a 2D array `pixels` of shape **(N, 3)** where **N = H×W**.
- Use dtype `float32` or `float64`.
- Print the shape and show the first 5 rows.

**Hint**: `pixels = img_np.reshape(-1, 3).astype(np.float32)`


In [None]:
h, w, c = img_np.shape
pixels = img_np.reshape(-1,3).astype(np.float32)
pixels.shape

In [None]:
pixels[:5]


In [None]:
h, w, c


## 2) KMeans (k-means++ init)

**Do this**
- Pick **k** (e.g., 8). Optionally sample ≤ 50k pixels for speed.
- Fit `KMeans(n_clusters=k, init="k-means++", random_state=0)` on your fit set.
- Predict labels for **all** pixels.
- Reconstruct a **quantized image** by replacing each pixel with its cluster center.
- Plot **Original vs Quantized (k-means++)** side-by-side.
- Visualize the **palette** (the `k` cluster centers).

**Imports you'll need**: `from sklearn.cluster import KMeans`


In [None]:
from sklearn.cluster import KMeans

k = 8

kmeans = KMeans(n_clusters=k, init="k-means++", random_state=0)
labels = kmeans.fit_predict(pixels)
centers = kmeans.cluster_centers_

predict labels for all pixels

In [None]:
labels

get kmean cluster centers

In [None]:
centers

set all pixels to cluster centers

In [None]:
quantized_pixels = centers[labels]
quantized_pixels

  reconstruct quantized image

In [None]:
quantized_image = quantized_pixels.reshape(h, w, c).astype(np.uint8)

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(img_np); ax[0].set_title("Original"); ax[0].axis("off")
ax[1].imshow(quantized_image); ax[1].set_title(f"KMeans++ (k={k})"); ax[1].axis("off")
plt.show()

palette = centers.astype(np.uint8).reshape(1, k, 3)
plt.figure(figsize=(8, 2))
plt.imshow(palette)
plt.title("Palette: KMeans++ centers")
plt.axis("off")
plt.show()



## 3) KMeans (random init)

**Do this**
- Repeat Task 2 with `init="random"`.
- Compare visually to k-means++ and comment on differences (centers, artifacts, convergence).


In [None]:
kmeans = KMeans(n_clusters=k, init="random", random_state=0)
labels_random = kmeans.fit_predict(pixels)
centers_random = kmeans.cluster_centers_

quantized_pixels_random = centers[labels_random]
quantized_image_random = quantized_pixels_random.reshape(h, w, c).astype(np.uint8)

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(img_np); ax[0].set_title("Original"); ax[0].axis("off")
ax[1].imshow(quantized_image_random); ax[1].set_title(f"Random (k={k})"); ax[1].axis("off")
plt.show()

palette = centers_random.astype(np.uint8).reshape(1, k, 3)
plt.figure(figsize=(8, 2))
plt.imshow(palette)
plt.title("Palette: Random centers")
plt.axis("off")
plt.show()


the colors on the picture change much more than they do with the kmeans++ model but the color pallet is still pretty good and captures most if not all the same colors kmeans++ does



## 4) MiniBatchKMeans (speed)

**Do this**
- Use `MiniBatchKMeans(n_clusters=k, batch_size=2048, random_state=0)`.
- Fit on your fit set, predict labels for all pixels, reconstruct the image.
- Compare output **quality vs speed** relative to full KMeans.

**Imports you'll need**: `from sklearn.cluster import MiniBatchKMeans`


In [None]:
from sklearn.cluster import MiniBatchKMeans

MBKMeans = MiniBatchKMeans(n_clusters=k, batch_size=2048, random_state=0)
labels_minibatch = MBKMeans.fit_predict(pixels)
centers_minibatch = MBKMeans.cluster_centers_

quantized_pixels_minibatch = centers_minibatch[labels_minibatch]
quantized_image_minibatch = quantized_pixels_minibatch.reshape(h, w, c).astype(np.uint8)

fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(img_np); ax[0].set_title("Original"); ax[0].axis("off")
ax[1].imshow(quantized_image_minibatch); ax[1].set_title(f"MiniBatchKMeans (k={k})"); ax[1].axis("off")
plt.show()

palette = centers_minibatch.astype(np.uint8).reshape(1, k, 3)
plt.figure(figsize=(8, 2))
plt.imshow(palette)
plt.title("Palette: MiniBatch centers")
plt.axis("off")
plt.show()

minibatch was much faster than random and kmeans++ and the quality on minibatch to me is the best of all 3 models.


## 6) Comparison grid (same k)

**Do this**
- Create a **2×2 grid** showing results from:
  - KMeans (k-means++)
  - KMeans (random)
  - MiniBatchKMeans
  - Spectral (small image)
- Use clear titles and hide axes.

**Hint**: `fig, axs = plt.subplots(2, 2, figsize=(12, 8))`


In [None]:
fig, axs = plt.subplots(2, 2, figsize=(12, 8))

axs[0,0].imshow(quantized_image); axs[0,0].set_title("KMeans++"); axs[0,0].axis("off")
axs[0,1].imshow(quantized_image_random); axs[0,1].set_title("KMeans random"); axs[0,1].axis("off")
axs[1,0].imshow(quantized_image_minibatch); axs[1,0].set_title("MiniBatchKMeans"); axs[1,0].axis("off")

# plot for potential spectral question #5 was not on the page
# i just show the original picture for now.
axs[1,1].imshow(img_np); axs[1,1].set_title("Spectral"); axs[1,1].axis("off")

plt.tight_layout()
plt.show()


I plotted all 3 models that we created. you can clearly see that minibatch and kmeans++ are much better than random init. I feel like minibatch might even look better than kmeans++


## 7)Elbow curve for choosing k

**Do this**
- On a random subset of pixels (e.g., 30–50k), compute **inertia (SSE)** for k in `[4, 6, 8, 12, 16]` using MiniBatchKMeans.
- Plot k (x-axis) vs inertia (y-axis) and look for an "elbow".


In [None]:
ks = [4, 6, 8, 12, 16]

# sample 30k pixels
n = min(30000, pixels.shape[0])
idx = np.random.choice(pixels.shape[0], size=n, replace=False)
sample = pixels[idx]

inertias = []
for kvalues in ks:
    mbk = MiniBatchKMeans(n_clusters=kvalues, batch_size=2048, random_state=0)
    mbk.fit(sample)
    inertias.append(mbk.inertia_)

plt.figure(figsize=(6,4))
plt.plot(ks, inertias, marker="o")
plt.xlabel("k")
plt.ylabel("Inertia (SSE)")
plt.title("Elbow Curve (MiniBatchKMeans)")
plt.show()


Plotted the k values vs inertia and you can see that as you increase k the inertia SSE monotonically increases and you can clearly see the "elbow" forming.