# Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

The DBSCAN algorithm is a clustering algorithm that works really well for datasets that have regions of high density.

The model can take array-like objects, either in host as NumPy arrays or in device (as Numba or cuda_array_interface-compliant), as well  as cuDF DataFrames.

For information about the cuDF format, refer to the [cuDF documentation](https://rapidsai.github.io/projects/cudf/en/latest/)

For information about cuML's DBSCAN implementation: https://rapidsai.github.io/projects/cuml/en/latest/api.html#dbscan

In [3]:
import os

import numpy as np
import cupy as cp

from sklearn import datasets

import pandas as pd
import cudf as gd

from sklearn.datasets import make_blobs

from sklearn.metrics import adjusted_rand_score

from sklearn.cluster import DBSCAN as skDBSCAN
from cuml.cluster import DBSCAN as cumlDBSCAN
cp.cuda.Device(2).use()

## Define Parameters

In [4]:
n_samples = 100000
n_features = 128

eps = 3
min_samples = 2

## Generate Data

### Host

In [5]:
host_data, host_labels = make_blobs(
   n_samples=n_samples, n_features=n_features, centers=5, random_state=7)

host_data = pd.DataFrame(host_data)
host_labels = pd.Series(host_labels)

### Device

In [6]:
device_data = gd.DataFrame.from_pandas(host_data)
device_labels = gd.Series(host_labels)

## Scikit-learn Model

In [7]:
%%time
clustering_sk = skDBSCAN(eps=eps,
                         min_samples=min_samples,
                         algorithm="brute",
                         n_jobs=-1)
clustering_sk.fit(host_data)

Exception in thread QueueManagerThread:
Traceback (most recent call last):
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 747, in _queue_management_worker
    recursive_terminate(p)
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/backend/utils.py", line 28, in recursive_terminate
    _recursive_terminate_without_psutil(process)
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/backend/utils.py", line 53, in _recursive_terminate_without_psutil
    _recursive_terminate(process.pid

Traceback (most recent call last):
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 833, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 521, in wrap_future_result
    return future.result(timeout=timeout)
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/concurrent/futures/_base.py", line 427, in result
    self._condition.wait(timeout)
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/threading.py", line 295, in wait
    waiter.acquire()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/site-packages/IPython/core/magics/execution.py", line 1238, in time
    exec(code, glob, local_ns)
  F

ERROR:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 833, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 521, in wrap_future_result
    return future.result(timeout=timeout)
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/concurrent/futures/_base.py", line 427, in result
    self._condition.wait(timeout)
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/threading.py", line 295, in wait
    waiter.acquire()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jyalim/.local/opt/anaconda/envs/rapids/lib/python3.6/site-packages/IPython/core/magics/execution.py", line 1238, in time
    exec(code, glob, local_ns)
  F

TypeError: must be str, not list

## cuML Model

In [6]:
%%time
clustering_cuml = cumlDBSCAN(eps=eps,
                             min_samples=min_samples)
clustering_cuml.fit(device_data)

CPU times: user 11.9 s, sys: 6.37 s, total: 18.3 s
Wall time: 18.5 s


## Evaluate Results

In [None]:
%%time
cuml_score = adjusted_rand_score(host_labels, clustering_cuml.labels_)
sk_score = adjusted_rand_score(host_labels, clustering_sk.labels_)