# K-means
* In this demo, we will test the k-means algorithm using sklearn.

### 1. Generate and visualize the data
* We use the function **make_blobs** from sklearn to generate the clusters and their labels.
* The API for this function is at: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs

n_samples = 1500
X, y = make_blobs(n_samples=n_samples, n_features=2, centers=3, random_state=170)
plt.scatter(X[:, 0], X[:, 1])
plt.show()

### 2. Run the k-means algorithm
* As before, we can define the k-mean model and call fit() to train it.
* The API for k-means is at: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

In [None]:
from sklearn.cluster import KMeans

model = KMeans(n_clusters=3) # Define the k-means model
model.fit(X) # Fit the model to the training data, aka computing the centroids

### 3. Make and visualize the predictions on the data
* We use the trained model to make predictions on the unlabeled train data.
* Then we plot the data with different colors based on their predicted labels.
* We also plot the centroids of the trained model. The centroids can be accessed using: **model.cluster_centers_**

In [None]:
# Make prediction on the training data
y_pred = model.predict(X)

# Plot the training data together with the cluster (in different colors)
plt.scatter(X[:, 0], X[:, 1], c=y_pred)

# Plot the centroids in red color
plt.scatter(model.cluster_centers_[:, 0], model.cluster_centers_[:, 1], s=50, c='red')

plt.show()

### 4. Testing with a wrong number of clusters
* Let's try what would happen if we use a wrong number of clusters

In [None]:
model = KMeans(n_clusters=2).fit(X)
y_pred = model.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=y_pred)
plt.scatter(model.cluster_centers_[:, 0], model.cluster_centers_[:, 1], s=50, c='red')
plt.show()