kMeans

Algorithm

K-Means is a clustering algorithm based on point assignment idea. The obtained clustering is:

Implementation

The code is implemented in different versions: sequential, parallelized with OpenMP and in Cuda.
To run all those implementations you can run the bash script (run it from the ./kMeans/ base folder):

bash run.sh

that will create a dataset of increasing size, and will run each implementation 20 times.
The Cuda version is implemented with a reduction phase decreasing the number of atomic adds of the algorithm. If you want to run it on your own you can generate the dataset with

python3 generateDataset.py num_samples num_centrois num_dimensions

Then run the code you want (check the csv paths in all the implemented files, it can lead to errors):

Plain c++ code:

g++ ./cpp/main.cpp -o mainCpp
./mainCpp

OpenMP code:

g++ -o mainOmp -fopenmp ./omp/main.cpp
./mainOmp

Cuda code:

nvcc ./cuda/main.cu -o mainCuda
./mainCuda

Results

The performance improvements are significant. In particular the computation times are:

Contributors

Contributions are made by Lorenzo Macchiarini and Andrea Leonardo for the course Parallel Computing of the Master Degree in Software Engineering.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
Plots		Plots
cpp		cpp
cuda		cuda
omp		omp
LICENSE		LICENSE
README.md		README.md
Relazione KMeans.pdf		Relazione KMeans.pdf
dataset.csv		dataset.csv
generateDataset.py		generateDataset.py
plotter.py		plotter.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kMeans

Algorithm

Implementation

Results

Contributors

About

Releases

Packages

Contributors 2

Languages

License

loremacchia/kMeans

Folders and files

Latest commit

History

Repository files navigation

kMeans

Algorithm

Implementation

Results

Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages