Mean Shift Algorithm Implementation Using CUDA

This repository contains an implementation of the Mean Shift algorithm using CUDA for the course "Parallel and Distributed Systems." The project aims to evaluate the performance improvements achieved through parallelizing the algorithm. Three versions have been developed: a serial implementation, and two parallel versions—one using shared memory and one without.

Prerequisites

GCC for compiling C code
NVIDIA CUDA Toolkit for compiling CUDA code

Project Files Description

serial.py - Serial implementation of the Mean Shift algorithm, serving as the baseline for comparison.
nonshared.cu - Parallel implementation without the utilization of shared memory.
shared.cu - Parallel implementation that utilizes shared memory to increase speed.
input.txt - Input data for the algorithm.
output_reference.txt - Reference data used to verify the algorithm's output.
output.txt - Contains the results from the most recent run of any implementation.
graph.png - Graph depicting the results of a test run in a distributed system.

Compilation

Run all programs with:

make all

This will compile the following binaries:

serial - For the serial implementation.
nonshared - For the CUDA implementation without shared memory.
shared - For the CUDA implementation with shared memory.

Usage

To run all the programs run:

make run_all

Or run a specific version with:

make run_serial

Specify the convergence criterion e for the Mean Shift algorithm by passing it as a parameter:

make run_all INPUT=0.05

Or for a specific version:

make run_shared INPUT=0.05

If not specified, the default value of 0.5 is used.

Implementation Details

Data Representation

Arrays: Instead of using multi-dimensional arrays, the data is represented using one-dimensional arrays. This approach simplifies memory access and enhances the performance in CUDA. Arrays are sized as arrays * columns, with data points indexed every two elements.

Memory Management

Dynamic Arrays: The arrayStatic, arrayYk, and arrayYkplus1 arrays are dynamically created and initialized with the data from input.txt. At the start of the Mean Shift algorithm, arrayYk is set to the initial data points, representing arrayStatic.

Parameter Handling

Convergence Criterion (e): The convergence parameter e is converted to a float using atof(). The execution times are recorded from the start to the finish of the algorithm for each data point, to assess the impact of different values of e.

Performance Analysis

Performance tests were conducted on a distributed system, comparing both shared and non-shared memory implementations under various settings of the initial parameter e.

The shared memory implementation benefits significantly when the convergence criterion e is smaller, leading to more iterations and hence better performance. A graph of the results can be seen below.

Moreover, performance tests were conducted in a local pc with Nvidia GeForce GTX 1060 as gpu. The produced results for e = 0.005 are in the table below:

Implementation	Execution Time
Serial	20.11 ms
Non-shared	2.04 ms
Shared	1.06 ms

Graphical Analysis

The graph above visualizes the execution times, demonstrating the advantages of the shared memory implementation, particularly for smaller values of e.

Conclusions

Shared Memory Utilization - Using shared memory is particularly beneficial for smaller values of the convergence criterion e, as demonstrated by the faster execution times for e=0.1, 0.05, 0.01.
Performance Enhancement - Parallelization with CUDA significantly reduces the execution time compared to the serial implementation, highlighting the effectiveness of CUDA in enhancing algorithmic performance in distributed systems.

Further Development

Further development could include adding tests at each stage of the algorithm and experimenting with CUDA optimizations for specific GPU architectures. Profiling could also provide insights into potential performance bottlenecks.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
graph.png		graph.png
input.txt		input.txt
nonshared.cu		nonshared.cu
output.txt		output.txt
output_reference.txt		output_reference.txt
serial.c		serial.c
shared.cu		shared.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mean Shift Algorithm Implementation Using CUDA

Prerequisites

Project Files Description

Compilation

Usage

Implementation Details

Data Representation

Memory Management

Parameter Handling

Performance Analysis

Graphical Analysis

Conclusions

Further Development

About

Languages

License

giannis-mel/CUDA-MeanShift

Folders and files

Latest commit

History

Repository files navigation

Mean Shift Algorithm Implementation Using CUDA

Prerequisites

Project Files Description

Compilation

Usage

Implementation Details

Data Representation

Memory Management

Parameter Handling

Performance Analysis

Graphical Analysis

Conclusions

Further Development

About

Topics

Resources

License

Stars

Watchers

Forks

Languages