ClusterCraft

This project has implemented k-Means algorithm for clustering an image from scratch.

What is kMeans clustering algorithm and how does it work?

K-Means clustering is a popular unsupervised machine learning algorithm used for data segmentation and clustering. It is primarily used to group data points into clusters based on their similarity or proximity.

Example:

Before K-Means	After K-Means

Image segmentation with clustering is a computer vision technique that involves the use of clustering algorithms to partition an image into distinct regions or clusters based on similarities in pixel values. This process is particularly useful in tasks like object detection and scene analysis, where isolating and understanding different parts of an image is essential.

Higher k values lead to more detailed image, but at the expense of higher compute cost and runtime.

Example:

Original Image	2 Clusters	5 Clusters

Working:

First starting with k initial cluster centroids (representative points). (Initialization step)
Assign each data point to the nearest centroid, forming clusters. (Assignment step)
After which it recalculates the centroid position by taking the mean of data points in each cluster. (Updating step)
Repeat last 2 steps (Assignment and Updating) until centroids no longer change significantly. (Iteration step and stopping condition)
Finally, we get k clusters with data points grouped by similarity.

Algorithm Implementation

First we got the image and cluster number as input and then implement the algorithm by following process:

Initialization

For selecting the initial 'k' cluster centroids, we take 'k' random pixels form the image's pixel grid.

Code-snippet

    def initialize_centroids(self, X):
        random_indices = np.random.choice(X.shape[0], self.n_clusters, replace=False)
        centroids = X[random_indices]
        return centroids

Assignment

Then we assign each pixel to its nearest cluster (the distance between a cluster and a pixel found by their Euclidean Distance)

Euclidean Distance

$d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}$

Where:

$(x_1$, $y_1)$ are coordinates of the first point.
$(x_2$, $y_2)$ are coordinates of the second point.
$d$ is the Euclidean distance between $(x_1$, $y_1)$ and $(x_2$, $y_2)$

Code-snippet

    def assign_clusters(self, X):
        distances = pairwise_distances(X, self.centroids)
        labels = np.argmin(distances, axis=1)
        return labels

Updating

We calculate the mean of distances all the pixels in a cluster to determine the new centroid position. This is done for all k clusters.

Mean Formula

$Mean = \frac{1}{n} \sum d_{i}$

Code-snippet

    def update_centroids(self, X):
        new_centroids = np.array([X[self.labels == k].mean(axis=0) for k in range(self.n_clusters)])
        return new_centroids

Iteration and Stopping Condition

The assignment and updation steps are repeated until the centroids positional change is close to the tolerance limit tol. To avoid infinite loops, a limit of 100 iterations is set.

Code-snippet

    for _ in range(self.max_iters):
        prev_centroids = self.centroids.copy()
        self.labels = self.assign_clusters(X)
        self.centroids = self.update_centroids(X)
        if np.linalg.norm(self.centroids - prev_centroids) < self.tol:
            break

Dependencies

Following are the project dependencies:

Python
Following python libraries (have pip installed)
1. numpy (pip install numpy)
2. matplotlib (pip install matplotlib)
3. scikit-learn (pip install scikit-learn)
4. flask (pip install Flask)

Download

Run git clone https://github.com/surtecha/ClusterCraft.git in your terminal and navigate the folder for app.py.

How to use

Run app.py in your preferred IDE or code editor.
Go to http://127.0.0.1:5000.
Choose an image from your directories and select a cluster count.
Click on Segment image button to get the segmented image.

Note: Higher cluster count requires significantly higer computational power and time.

Contributing

See Contributing for the contribution guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
README resources		README resources
static		static
templates		templates
testing-files		testing-files
uploads		uploads
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
custom_kmeans.py		custom_kmeans.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClusterCraft

Table of Contents

What is kMeans clustering algorithm and how does it work?

Example:

Example:

Working:

Algorithm Implementation

Initialization

Assignment

Updating

Iteration and Stopping Condition

Dependencies

Download

How to use

Contributing

About

Contributors 4

Languages

License

surtecha/ClusterCraft

Folders and files

Latest commit

History

Repository files navigation

ClusterCraft

Table of Contents

What is kMeans clustering algorithm and how does it work?

Example:

Example:

Working:

Algorithm Implementation

Initialization

Assignment

Updating

Iteration and Stopping Condition

Dependencies

Download

How to use

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

Languages