# DBSCAN

## References
- https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html?highlight=dbscan#sklearn.cluster.DBSCAN

## Step-1: Generate Some Data

In [None]:
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

num_centers = 5

X, y = make_blobs(n_samples=1000,  n_features=2, centers=num_centers)

print ('X.shape:', X.shape)
print ('y.shape:', y.shape)

# note the color coding of clusters
plt.scatter(X[:, 0], X[:, 1], marker='o', c=y,
            s=25, edgecolor='k')
plt.show()

## Step-2: Run MeanShift Clustering

In [None]:
from sklearn.cluster import DBSCAN

dbscan = DBSCAN(eps=5, min_samples=2).fit(X)

labels = dbscan.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)

print ("labels:\n", labels)
print("Estimated number of clusters: %d" % n_clusters_)
print("Estimated number of noise points: %d" % n_noise_)

## Step-3: Visualize Clusters

In [None]:
## plot predicted results
y_pred = dbscan.fit_predict(X)

## now observe the color coding of clusters
## do they match?
plt.scatter(X[:, 0], X[:, 1], c=y_pred)

## Experiment

```python
DBSCAN(eps=3, min_samples=2)
```

Change these parameters and re run the algorithm.  Does it change the predicted clusters?