# Step 1: Importing the necessary libraries

In [34]:
import numpy as np
from sklearn.cluster import KMeans
from PIL import Image
import pandas as pd

# Step 2: Getting the Dataset and Assigning Variables

In this step, we load the base image that we wanted to compress and use a combination of `pandas` and the `Image` from `Pillow` to create a dataset. We also assign its `width` and `height` to store details about the image size and load the RGB data of each pixel of the base image into a NumPy array to fit into a KMeans cluster model in the next steps.

In [35]:
# Read the image
colourImg = Image.open("base.jpg")

# Convert the image to a NumPy array of pixels
colourPixels = colourImg.convert("RGB")
colourArray = np.array(colourPixels.getdata())

# Convert the NumPy array to a DataFrame
df = pd.DataFrame(colourArray, columns=["red","green","blue"])

# Save the DataFrame to a txt file to be used as a dataset
np.savetxt('datasetRGB.txt', df.values, fmt='%d')


data = np.loadtxt('datasetRGB.txt')  # Load the RGB pixel data from the file into a NumPy array
im = Image.open("base.jpg")
print(im.size)
width, height = im.size # Get the width and height of the image

(256, 192)


# Step 3: Modelling

In this step, we use the in-built `KMeans` library algorithm to create KMeans clusters to compress our image by `clustering the RGB values` from the  dataset we created earlier. The number of clusters, or ***`k`***, can also be considered to be the `number of colors`. To explain it simply, consider that our base image (the image we want to compress) has ***`n`*** colors. Now, by assigning a value to ***`k`***, we are reducing the ***`n`*** colors to ***`k`*** colors with the help of clustering. A higher value of ***`k`*** will lead to a better quality image and a lower loss of colors. However, it will also lead to a smaller difference between the sizes of the actual and the compressed images. In the next section of this notebook, there are a few examples demonstrating the same concept.

In [36]:
k = 16  # Number of clusters for compression
kmeans = KMeans(n_clusters=k, random_state=0)  # Create a KMeans object
kmeans.fit(data)  # Perform K-means clustering



# Step 4: Output the Compressed Image

Now that we have formed our clusters, we get the `cluster labels` for each of the pixels and then proceed to compress the image by `replacing each pixel with the centroid of the cluster it was assigned to`. We then reshape the compressed data that now consists of replaced pixels, so that it equals the size of our base image. We then convert each of the data values into an `8-bit unsigned integer` as it ranges from 0 to 255, and is often used to represent the respective RGB color channels. Finally, we use `Pillow` once more to convert our data into an Image object and can now output in a format that fits our needs.

In [37]:
labels = kmeans.labels_  # Get the cluster labels for each pixel
compressed_data = kmeans.cluster_centers_[labels]  # Replace pixel values with centroid values

# Reshape the compressed image to match the original image
compressed_image = np.reshape(compressed_data, (height, width, 3))

# Convert the NumPy array to an image
compressed_image = compressed_image.astype(np.uint8) 
compressed_image = Image.fromarray(compressed_image)  # Create a PIL Image object

# Save the image as jpg in the current directory
compressed_image.save('compressed_image.jpg')