<img src = "https://imgur.com/s4wTnl7.jpg" align = "center">

# <center>K-Means Clustering for image Segmentation</center>

## Introduction

There are many models for **clustering** out there. In this notebook, we will be presenting the model that is considered the one of the simplest model among them. Despite its simplicity, the **K-means** is vastly used for clustering in many data science applications, especially useful if you need to quickly discover insights from **unlabeled data**. In this notebook, you learn how to use k-Means for customer segmentation.

Some real-world applications of k-means:
- Customer segmentation
- Understand what the visitors of a website are trying to accomplish
- Pattern recognition
- Machine learning
- Data compression


In this notebook we practice k-means clustering with 2 examples:
- k-means on a random generated dataset
- Using k-means for customer segmentation

### Import libraries
Lets first import the required libraries.
Also run <b> %matplotlib inline </b> since we will be plotting in this section.

In [0]:
import random 
import numpy as np 
import matplotlib.pyplot as plt 
from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs 
%matplotlib inline

import sys
from PIL import Image
from sklearn import preprocessing
from sklearn.metrics.pairwise import euclidean_distances

### Chek and import Dataset

Chek folder and file in google colab

In [0]:
!ls

Cust_Segmentation.csv  drug200.csv  h1.jpg  h2.jpeg  sample_data


if dataset h1 and h2 there is no in cloound please import using this code:

<code> from google.colab import files
uploaded = files.upload() </code>


# k-Means for image segmentation
Lets create our own dataset for this lab!


Number of iteration

In [0]:
iterations = 5


#	Open input image

In [0]:
#	Open input image
image = Image.open("h2.jpeg")
imageW = image.size[0]
imageH = image.size[1]

Chek all about in image

In [0]:
print(image)

<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=612x344 at 0x7F44F6C3E5F8>


Create Dataset from image (DataVector and Initialize Cluster)

In [0]:
#	Initialise data vector with attribute r,g,b,x,y for each pixel
dataVector = np.ndarray(shape=(imageW * imageH, 5), dtype=float)
#	Initialise vector that holds which cluster a pixel is currently in
pixelClusterAppartenance = np.ndarray(shape=(imageW * imageH), dtype=int)

Check all atribut dataset

In [0]:
#print(dataVector)
print(pixelClusterAppartenance)

[4653201915638920135 4653204250549407422 4653206684847419401 ...
                   0                   0                   0]


#reshape dataset
Populate data vector with data from input image.

DataVector has 5 fields: red, green, blue, x coord, y coord

In [0]:
#	Populate data vector with data from input image
#	dataVector has 5 fields: red, green, blue, x coord, y coord
for y in range(0, imageH):
      for x in range(0, imageW):
      	xy = (x, y)
      	rgb = image.getpixel(xy)
      	dataVector[x + y * imageW, 0] = rgb[0]
      	dataVector[x + y * imageW, 1] = rgb[1]
      	dataVector[x + y * imageW, 2] = rgb[2]
      	dataVector[x + y * imageW, 3] = x
      	dataVector[x + y * imageW, 4] = y

Cek dataVector

In [0]:
print(dataVector)

[[234. 230. 227.   0.   0.]
 [234. 230. 227.   1.   0.]
 [233. 229. 226.   2.   0.]
 ...
 [240. 236. 235. 609. 343.]
 [240. 236. 235. 610. 343.]
 [240. 236. 235. 611. 343.]]


#	Standarize the values of our features

In [0]:
#	Standarize the values of our features
dataVector_scaled = preprocessing.normalize(dataVector)

Cek Normalize DataVector

In [0]:
print(dataVector_scaled)

[[0.58649564 0.57647007 0.5689509  0.         0.        ]
 [0.5864938  0.57646826 0.56894911 0.00250638 0.        ]
 [0.58652788 0.57645873 0.56890687 0.00503457 0.        ]
 ...
 [0.29608179 0.2911471  0.28991342 0.75130755 0.42315023]
 [0.29580752 0.29087739 0.28964486 0.75184411 0.42275825]
 [0.29553356 0.290608   0.28937661 0.75237919 0.42236671]]


# Set Random Centers

In [0]:
# Choose number of Cluster

K = 5

#	Set centers
minValue = np.amin(dataVector_scaled)
maxValue = np.amax(dataVector_scaled)

centers = np.ndarray(shape=(K,5))
for index, center in enumerate(centers):
	centers[index] = np.random.uniform(minValue, maxValue, 5)

Check initial Centers 

In [0]:
print(centers)

[[0.9509351  0.51931384 0.28212965 0.23964142 0.64094786]
 [0.43385655 0.19387976 0.27810685 0.14752103 0.59768047]
 [0.70008379 0.53836617 0.59280541 0.6433986  0.34234705]
 [0.84891184 0.52301338 0.52133848 0.61600656 0.66989071]
 [0.83652712 0.69504871 0.14579655 0.85815107 0.31588473]]





## Runnig K-Means
let's run our K-Means Clustering without using sklear lib.

In [0]:
for iteration in range(iterations):
	#	Set pixels to their cluster
	for idx, data in enumerate(dataVector_scaled):
		distanceToCenters = np.ndarray(shape=(K))
		for index, center in enumerate(centers):
			distanceToCenters[index] = euclidean_distances(data.reshape(1, -1), center.reshape(1, -1))
		pixelClusterAppartenance[idx] = np.argmin(distanceToCenters)

	##################################################################################################
	#	Check if a cluster is ever empty, if so append a random datapoint to it
	clusterToCheck = np.arange(K)		#contains an array with all clusters
										#e.g for K=10, array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
	clustersEmpty = np.in1d(clusterToCheck, pixelClusterAppartenance)
										#^ [True True False True * n of clusters] False means empty
	for index, item in enumerate(clustersEmpty):
		if item == False:
			pixelClusterAppartenance[np.random.randint(len(pixelClusterAppartenance))] = index
			# ^ sets a random pixel to that cluster as mentioned in the homework writeup
	##################################################################################################

	#	Move centers to the centroid of their cluster
	for i in range(K):
		dataInCenter = []

		for index, item in enumerate(pixelClusterAppartenance):
			if item == i:
				dataInCenter.append(dataVector_scaled[index])
		dataInCenter = np.array(dataInCenter)
		centers[i] = np.mean(dataInCenter, axis=0)

	#TODO check for convergence
	print ("Centers Iteration num", iteration, ": \n", centers)

Centers Iteration num 0 : 
 [[0.18167326 0.12493341 0.1033544  0.66291511 0.67495263]
 [0.38820087 0.35409417 0.33540413 0.40292055 0.57295882]
 [0.48886004 0.46267521 0.44676012 0.40519766 0.2745921 ]
 [0.34992854 0.32466055 0.30960395 0.79118631 0.17867978]
 [0.20817637 0.13012209 0.07818966 0.90441037 0.28057203]]
Centers Iteration num 1 : 
 [[0.20190193 0.14186992 0.11303355 0.69366729 0.63771643]
 [0.40475771 0.37748129 0.36377182 0.35141022 0.59672758]
 [0.5410714  0.52752316 0.51910328 0.21381406 0.26008721]
 [0.36257562 0.32931562 0.31023444 0.76667829 0.22571871]
 [0.20884473 0.1308062  0.07905302 0.90332471 0.286495  ]]
Centers Iteration num 2 : 
 [[0.20565923 0.1443892  0.11389157 0.69935393 0.62890211]
 [0.39635229 0.37452805 0.36520294 0.33458345 0.61960565]
 [0.54147814 0.53219084 0.5265757  0.18607054 0.26769665]
 [0.36883698 0.33301209 0.31225473 0.7558328  0.23842151]
 [0.21095435 0.13207574 0.0796017  0.9086405  0.26953864]]
Centers Iteration num 3 : 
 [[0.20427515 0.

Reshape to original image matrix

In [0]:
#	set the pixels on original image to be that of the pixel's cluster's centroid
for index, item in enumerate(pixelClusterAppartenance):
	dataVector[index][0] = int(round(centers[item][0] * 255))
	dataVector[index][1] = int(round(centers[item][1] * 255))
	dataVector[index][2] = int(round(centers[item][2] * 255))

#	Save image
image = Image.new("RGB", (imageW, imageH))

for y in range(imageH):
	for x in range(imageW):
	 	image.putpixel((x, y), (int(dataVector[y * imageW + x][0]), 
	 							int(dataVector[y * imageW + x][1]),
	 							int(dataVector[y * imageW + x][2])))


Save Image

In [0]:
image.save("output.jpg")

Check our cloud

In [0]:
!ls

Cust_Segmentation.csv  drug200.csv  h1.jpg  h2.jpeg  output.jpg  sample_data


Original Code:

https://github.com/asselinpaul/ImageSeg-KMeans/blob/master/imageSegmentation.py