GitHub - l3lackcurtains/fast-cuda-gpu-dbscan: :star2: CUDA-DClust+: Fast DBSCAN algorithm implemented on CUDA. Based on the research paper.

🌟 CUDA-DClust+: Fast DBSCAN algorithm implemented on CUDA

CUDA-DClust+ is a fast DBSCAN algorithm that leverages many of the algorithm designs in CUDA-DClust and parallels DBSCAN algorithms in the literature.

The algorithm takes as input the dataset D, ϵ, and minpts, and outputs a list of points and their corresponding cluster or whether it has been assigned a noise label. The algorithm constructs an index on the dataset (D) based on the number of partitions (r) at each level of the tree. Then, it performs the cluster expansion routine on the GPU and merges the clusters on the CPU.

Research Paper

Key contributions

CUDA-DClust constructs the index using the CPU. In contrast, CUDA-DClust+ performs the construction of the index on the GPU in parallel. Since index construction takes non-negligible time, computing it in parallel on the GPU improves performance.
CUDA-DClust uses a constant seed list size to keep track of points in the cluster expansion routine. The constant seed list size can inadvertently discard points from the cluster. To mitigate this issue, it refills the seed list to minimize discarding points. In contrast, CUDA-DClust+ uses a correction merge routine that merges the discarded points to their clusters. This improves performance and slightly improves the accuracy of the clustering results.
CUDA-DClust uses a large seed list to maintain accurate clustering results. In contrast, a large seed list size is not necessary for CUDA-DClust+, as discarded points are merged by our correction merge routine.
In CUDA-DClust+, the overhead of communication between CPU and GPU is minimized by managing the seed list within the DBSCAN expansion kernel until the seed lists are empty. Limiting CPU/GPU synchronization and other overheads improves performance.

Outcome

DBSCAN comparison paper by Mustafa et al. showed that GDBSCAN outperforms CUDA-DClust with speedup up to 18x. Using our optimizations, we show that CUDA-DClust+ achieves up to ~23x speedup compared with GDBSCAN, demonstrating that the original CUDA-DClust design works well when optimized for newer GPU architectures.

Future Works

Future work includes investigating the dense box algorithm that can be used to further reduce the number of distance calculations in high-density regions.

Organization

.
├── cpu-dbscan              # CPU DBSCAN algorithm
├── multicore-cpu           # Parallel Multi-threaded CPU DBSCAN algorithm
├── dbscan-cpu-indexing     # CUDA DBSCAN algorithm with CPU indexing
├── gdbscan                 # G-DBSCAN algorithm
├── cuda-dclust             # CUDA-DClust algorithm (clone)
├── comparision-plots       # Comparision plots with Jupyter Notebook code
|── test-results            # Scripts to test the cluster results
|── test-datasets           # 2D and 3D datasets for test
|── main.cu                 # Main CUDA-DClust+ algorithm
|── exp.cu                  # CUDA-DClust+ algorithm used in an experiment
|── Makefile                # Makefile for CUDA-DClust+ algorithm
└── paper.pdf               # Research paper

Datasets

Datasets used in the experiment are available online. https://rcdata.nau.edu/gowanlock_lab/datasets/CUDA_DCLUST_datasets/CUDA_DCLUST_HiPC2021.zip

Read 'readme.txt' inside archive file before using the datasets.

Usage

CUDA-DClust+

Change the DBSCAN configuration in 'common.h' file file and run

make && ./main.exe ./test-datasets/2d.txt

Experimental Results

--

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌟 CUDA-DClust+: Fast DBSCAN algorithm implemented on CUDA

Research Paper

Key contributions

Outcome

Future Works

Organization

Datasets

Usage

CUDA-DClust+

Experimental Results

2D experiment

3D experiment

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
comparision-plots		comparision-plots
cpu-dbscan		cpu-dbscan
cuda-dclust		cuda-dclust
dbscan-cpu-indexing		dbscan-cpu-indexing
gdbscan		gdbscan
multicore-cpu		multicore-cpu
test-datasets		test-datasets
test-results		test-results
.gitignore		.gitignore
Makefile		Makefile
common.h		common.h
dbscan.cu		dbscan.cu
dbscan.h		dbscan.h
exp.cu		exp.cu
indexing.cu		indexing.cu
indexing.h		indexing.h
main.cu		main.cu
paper.pdf		paper.pdf
readme.md		readme.md

l3lackcurtains/fast-cuda-gpu-dbscan

Folders and files

Latest commit

History

Repository files navigation

🌟 CUDA-DClust+: Fast DBSCAN algorithm implemented on CUDA

Research Paper

Key contributions

Outcome

Future Works

Organization

Datasets

Usage

CUDA-DClust+

Experimental Results

2D experiment

3D experiment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages