Parallelized Hybrid PSO-KMeans Clustering Algorithm

Final Project - CS 432 GPU Accelerated Computing

Instructions to run

open the notebook file here. It includes link to the colab file that includes the demo of all files.

Members:

Rayyan-ul-Haq ru03588
Mudasir Hanif Shaikh ms03831
Syed Ammar Ahmed sa04050

Problem Definition and Applications

Clustering is a very important task in unsupervised Machine Learning. The aim of clustering is to group similar objects together based on a distance metric. It is used in a variety of problems and domains such as image processing, finance, market research etc. For example, clustering can be used by businesses to group their customers based on their likes/dislikes, purchase power, frequency of purchase etc. [1].

One of the most popular algorithms used for clustering is the K-means algorithm and therefore has been a focus of many researchers. But as with all algorithms, K-means is not perfect. Many fronts have been explored in recent years to improve both the accuracy and execution time of this algorithm. Several of these attempts involved using a well known Swarm Intelligence approach known as Particle Swarm Optimization (PSO). This project aims to utilize and parallelize a hybrid approach using both PSO and K-Means for data clustering.

Literature Review

PSO has been used in literature to solve the problem of initial cluster centers affecting the performance of K-Means algorithm. Zhao et. al proposed that K-Means algorithm could be improved by using PSO to generate the initial cluster centers and experimentally found out that the improved k-mean clustering algorithm has obvious advantages on execution time [2]. Merwe et. al also used the same idea and found out that PSO can be used to find the centroids of a user specified number of clusters and then extended to use K-means clustering to seed the initial swarm [3]. PSO can also be used to reform the clusters formed by K-Means. Both of these approaches have shown good potential [3].

Ahmadyfard et. al worked on the limitations on PSO and noted that PSO algorithm can successfully converge during the initial stages of a global search, but around global optimum, the search process becomes very slow [4]. K-Means however achieves faster convergence when it is near optimal solution. The accuracy of K-Means is also higher which is what makes it so popular. They proposed a hybrid algorithm called "PSO-KM" and experimentally argued that the hybrid approach has more potential than both K-Means and PSO [4].

Performance Analysis

We will analyze the performance and results of our algorithm on not one but several clustering benchmark datasets found here and present our analysis on the improvement (if any) of the parallelized GPU version of the hybrid approach over the CPU version.

Research and Novelty

From our research and exploration so far, we have realized that the problem of using a hybrid approach in itself has room for more research, however, our aim is not to improve the algorithm but rather present a parallelized version for it, which as we understand has not yet been properly explored. Therefore, we believe this project has a lot of research potential and can lead to a publication. As the task is not very simple, and would require a lot of research we believe this also justifies three members in our team.

Type of project

We believe that our projects falls in the category of "Implement a GPU version of some existing computationally-intensive CPU code." The justification for this is outlined in the previous sections on novelty, problem definition and analysis.

Major goals of the project

Implement the CPU version of the hybrid approach
Parallelize the hybrid approach and implement the GPU version on pyCUDA or Numba.
Performance analysis (outlined above)

References

[1] J. Oyelade et al., "Data Clustering: Algorithms and Its Applications," 2019 19th International Conference on Computational Science and Its Applications (ICCSA), Saint Petersburg, Russia, 2019, pp. 71-81, doi: 10.1109/ICCSA.2019.000-1.

[2] M. Zhao, H. Tang, J. Guo, and Y. Sun, “Data Clustering Using Particle Swarm Optimization,” Lecture Notes in Electrical Engineering Future Information Technology, pp. 607–612, 2014.

[3] D. V. D. Merwe and A. Engelbrecht, “Data clustering using particle swarm optimization,” The 2003 Congress on Evolutionary Computation, 2003. CEC '03. https://ieeexplore.ieee.org/document/1299577

[4] A. Ahmadyfard and H. Modares, "Combining PSO and k-means to enhance data clustering," 2008 International Symposium on Telecommunications, Tehran, 2008, pp. 688-691, doi: 10.1109/ISTEL.2008.4651388.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
Code		Code
figures		figures
.gitignore		.gitignore
README.md		README.md
gpu_project_run.ipynb		gpu_project_run.ipynb
merge3cluster.jpg		merge3cluster.jpg
proposal.md		proposal.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code

Code

figures

figures

.gitignore

.gitignore

README.md

README.md

gpu_project_run.ipynb

gpu_project_run.ipynb

merge3cluster.jpg

merge3cluster.jpg

proposal.md

proposal.md

Repository files navigation

Parallelized Hybrid PSO-KMeans Clustering Algorithm

Final Project - CS 432 GPU Accelerated Computing

Instructions to run

open the notebook file here. It includes link to the colab file that includes the demo of all files.

Members:

Problem Definition and Applications

Literature Review

Performance Analysis

Research and Novelty

Type of project

Major goals of the project

References

About

Releases

Packages

Languages

ms03831/parallelized-PSO-clustering

Folders and files

Latest commit

History

Repository files navigation

Parallelized Hybrid PSO-KMeans Clustering Algorithm

Final Project - CS 432 GPU Accelerated Computing

Instructions to run

open the notebook file here. It includes link to the colab file that includes the demo of all files.

Members:

Problem Definition and Applications

Literature Review

Performance Analysis

Research and Novelty

Type of project

Major goals of the project

References

About

Resources

Stars

Watchers

Forks

Languages