Images can be compressed in a different way. Each image can be stored as a 3 dimensional matrix where each pixel is represented by a 3-dimensional vector.

By clustering these pixel vectors into $k$ clusters, each pixel can be assigned to the closest cluster centroid. In this way, each pixel is replaced by one of the $k$ cluster centroids, so the original image can be represented with only $k$ three-dimensional vector.

In [None]:
from __future__ import division

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

from sklearn.cluster import KMeans

In [None]:
image = np.array(Image.open("images/Castle_hill.jpg"))

In [None]:
image = image / 255
row, col, _ = image.shape
print("pixels in one channel: {} * {}".format(row, col))

In [None]:
fig = plt.figure(figsize=(15, 10))
img = fig.add_subplot(1, 1, 1)
imgplot = plt.imshow(image)
img.set_title("Castle hill, Budapest")
plt.show()

In [None]:
X = image.reshape(row * col, 3)

In [None]:
kmeans = KMeans(n_clusters=4, n_init=8, max_iter=30).fit(X)

In [None]:
cluster_centers = kmeans.cluster_centers_

In [None]:
indices = kmeans.predict(X)

To store the compressed image we only need the cluster centroids and an integer vector that assigns the closest cluster centroid to each pixel.

In [None]:
X_compressed = cluster_centers[indices, :].reshape(row, col, 3)

In [None]:
X_compressed.shape

In [None]:
fig = plt.figure(figsize=(15, 10))
img = fig.add_subplot(1, 1, 1)
imgplot = plt.imshow(X_compressed)
img.set_title("Castle hill, Budapest")
plt.show()