# Determining the Optimal K for K-Means

In this notebook we will explore:

1. Elbow Method.
2. Silhouette.
3. Calinski_harabasz.

__Source__: https://www.scikit-yb.org/en/latest/api/cluster/elbow.html

<hr>

# 1. Elbow

In [None]:
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from yellowbrick.cluster import KElbowVisualizer



In [None]:
# Generate synthetic dataset with 8 random clusters
X, y = make_blobs(n_samples=1000, n_features=12, centers=8, random_state=42)

In [None]:
# Instantiate the clustering model and visualizer
model = KMeans()
visualizer = KElbowVisualizer(model, k=(4,12))

In [33]:
visualizer.fit(X)        # Fit the data to the visualizer
visualizer.show()        # Finalize and render the figure

<hr>

# 2. Calinski_harabaz

In [34]:
# Instantiate the clustering model and visualizer
model = KMeans()
visualizer = KElbowVisualizer(model, k=(4,12), metric='calinski_harabaz', timings=False)

visualizer.fit(X)        # Fit the data to the visualizer
visualizer.show()        # Finalize and render the figure

<hr>

# 3. Silhouette

In [35]:
from sklearn.cluster import KMeans
from yellowbrick.cluster import SilhouetteVisualizer

# Generate synthetic dataset with 8 random clusters
X, y = make_blobs(n_samples=1000, n_features=12, centers=5, random_state=42)

# Specify the features to use for clustering
features = ['A', 'B', 'C', 'D', 'E']

# Instantiate the clustering model and visualizer
model = KMeans(8, random_state=42)
visualizer = SilhouetteVisualizer(model, colors='yellowbrick')

visualizer.fit(X)        # Fit the data to the visualizer
visualizer.show()        # Finalize and render the figure

<hr>

# Lets work!

1. Try different dataset with random clusters
 