With `KMeans` we still need to choose a number `n_clusters` of clusters.

How do we pick this?

The basic idea is that we pick `n_clusters` so that a larger value of
`n_clusters` wouldn&rsquo;t provide a substantively **better** model of the
data.  To make this precise, we need to clarify what **better** means.



## Some real data



Again, let&rsquo;s load the iris data.



In [1]:
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)

Here we **know** labels `y` and we know that there are supposed to be
three species of iris.  Is this supported by the data?



In [1]:
kmeans = KMeans(n_clusters=3).fit(X)
kmeans.inertia_

What is `inertia`?  This is the &ldquo;within-cluster sum-of-squares.&rdquo;
Ultimately, with `KMeans` this is what we are hoping to minimize by
partitioning the data into the clusters.

Let&rsquo;s plot this within-cluster sum-of-squares for the clusters computed via `KMeans` for multiple choices of `n_clusters`.



In [1]:
import matplotlib.pyplot as plt
plt.plot( [KMeans(n_clusters=n).fit(X).inertia_ for n in range(1,10)] )
plt.show()

In the graph, we look for an &ldquo;elbow&rdquo; where an additional cluster
wouldn&rsquo;t help much.  (These &ldquo;elbows&rdquo; are a common theme in data
science: after all, we want models complex enough to capture the
regularities in the data, but not **too** complex to suffer from
overfitting.)



## Your homework



Do this &ldquo;elbow analysis&rdquo; on the MNIST data.

If more than 10 clusters is warranted, can you describe what these
additional clusters are?

