# Tasks:

- Create moons with the sklearn.datasets.make_moons() function.
- Remember to scale the data.
- Cluster the data with DBSCAN.
- Set min_samples=30.
- Use the NearestNeighbors class to determine a good value for 'eps'.
- Create a DBSCAN model and fit it to the data.
- Plot the resulting clusters.
- Cluster the data using Agglomerative Clustering.
- Create a dendrogram and find where the average length of the vertical lines is the longest.
- Create and fit an Agglomerative Clustering model on the data.
- Plot the resulting clusters.


# Imports

In [None]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler
from sklearn.cluster import DBSCAN, AgglomerativeClustering
from sklearn.neighbors import NearestNeighbors

import scipy.cluster.hierarchy as sch

from sklearn.datasets import make_moons

# Create Data

### Code sample given below:

In [None]:
moons = make_moons(n_samples = 1000, noise=.1, random_state=42)[0]
plt.scatter(moons[:,0], moons[:,1])

# Scale Data

In [None]:
db_moons = StandardScaler().fit_transform(moons)

# DBSCAN

## Find a good value for epsilon 

In [None]:
min_samples=30
n_neighbors = NearestNeighbors(n_neighbors=min_samples)
n_neighbors.fit(moons)
distances, indices= n_neighbors.kneighbors(moons)
distances[:5]

In [None]:
sorted_distances = np.sort(distances[:, min_samples-1])
sorted_distances[:5]

In [None]:
plt.plot(sorted_distances)
plt.grid();

for min_samples = 30, it looks like 0.21 might be a good epsilon, as the elbow seems to begin there.


In [None]:
dbs = DBSCAN(min_samples=min_samples, eps=0.21)
dbs.fit(moons)
plt.scatter(moons[:,0], moons[:,1], c=dbs.labels_)

The clustering is not perfect and there are a few samples assigned to the 'noise' cluster.

# Visualize Dendrogram


In [None]:
plt.figure(figsize = (15, 5))
sch.dendrogram(sch.linkage(moons, method = 'ward'))
plt.xlabel('Data Points');

It looks like the best number of clusters would is 2, because that's where the vertical lines average the longest.

In [None]:
agg = AgglomerativeClustering(n_clusters=2)
agg.fit(moons)

plt.scatter(moons[:,0], moons[:,1], c=agg.labels_);

Agglomerative clustering seems to have assigned half of the moon on the left to the cluster on the right.

