In [None]:
# import make_moons from sklearn
from sklearn.datasets import make_moons

# import DBSCAN
from sklearn.cluster import DBSCAN

In [None]:
#Now, we have to create an instance of DBSCAN class from the Sklearn library:
db = DBSCAN(eps=0.5,
            min_samples=5,
            metric='euclidean')

In [None]:
"""
We created the instance of DBSCAN class with a few parameters we didn't use before:

eps: The maximum distance between two samples for one to be considered as being in the neighborhood of the other. 
This is not a maximum bound on the distances of points within a cluster. It is the most important DBSCAN parameter to 
choose appropriately for our dataset and distance function.

min_samples: The number of samples in a neighborhood for a point to be considered as a core point. This includes the 
point itself.

Now it's time to fit the data:
"""

In [None]:
# fit and predicr
y_db = db.fit_predict(X)

# Plot DBSCAN clusters
plot_clusters(X,y_db)

#The difference from the algorithms used in the previous sections is that DBSCAN also created a separate cluster 
#for outliers. In the plot, the outliers are displayed as red squares.

In [None]:
#But what happens when we try "moon-shaped" data? Let's find out:
# generate moon-shape data
X, y = make_moons(n_samples=200,
                  noise=0.05,
                  random_state=0)

# plot data
plt.scatter(X[:,0], X[:,1])
plt.show()

In [None]:
#But how will k-means perform on data like this?

# import k-means
from sklearn.cluster import KMeans

# Fit K-means
km = KMeans(n_clusters=2, # how many clusters we expected 
            n_init=10,
            random_state=0,)

y_km = km.fit_predict(X)

# plot K-means clusters
plot_clusters(X,y_km,plt_cluster_centers=True)

In [None]:
# From the plot above, we can see that the result is not what we expected. The data is not separated into clusters well.

#Can we get a better result from Agglomerative clustering? Let's find out:

# import Agglomerative clustering
from sklearn.cluster import AgglomerativeClustering

# fit Agglomerative clustering
ac = AgglomerativeClustering(affinity='euclidean',
                             linkage='ward',
                             n_clusters = 2)
y_hc = ac.fit_predict(X)

# plot HC clusters
plot_clusters(X,y_hc)

In [None]:
#The result is just as bad as with the k-means algorithm. In both cases, it is not our fault. 
#We set the parameters correctly and did our best to get good results. It's nice to remember that none of the clustering 
#algorithms mentioned above can deal with non-spherical clusters.

#Lastly, let's try DBSCAN:
# fit DBSCAN
db = DBSCAN(eps=0.2,
            min_samples=5,
            metric='euclidean')

y_db = db.fit_predict(X)

# plot DBSCAN clusters
plot_clusters(X,y_db)

Conclusion
In this walkthrough, we have seen where it makes sense to use density models instead of more traditional, distance and hierarchical models. However, DBSCAN is not the best for sparse points that are still part of a cluster because they will be treated as outliers. We need to be careful about this when using density models in the future.