# Implementing K-means Clustering in Python

K-means is a popular clustering algorithm used to partition data into 'K' clusters based on their similarities. It is an unsupervised learning algorithm, meaning it doesn't require labeled data for training. K-means is widely used for various applications like customer segmentation, image compression, and anomaly detection.

In this notebook, we'll walk through the steps of implementing K-means in Python using the scikit-learn library.

## Step 1: Import Libraries

Let's start by importing the required libraries: NumPy for numerical computations and scikit-learn's KMeans for implementing K-means.

In [1]:
# Importing libraries
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

## Step 2: Load Data

In this example, we'll create a sample dataset to demonstrate the K-means algorithm. You can replace this with your own dataset later.

In [2]:
# Creating a sample dataset
data = np.array([[2, 3],
                 [5, 8],
                 [1, 2],
                 [8, 6],
                 [6, 4],
                 [1, 3]])

## Step 3: Feature Scaling (Optional)

In some cases, it's essential to perform feature scaling to bring all features to a similar scale. However, for this example, we'll skip this step as the dataset is already relatively small.

## Step 4: Initialize K-means

Now, let's create a K-means object with the desired number of clusters 'K'. In this example, we'll set 'K' to 2, but you can change it according to your dataset and requirements.

In [3]:
# Initialize K-means
k = 2
kmeans = KMeans(n_clusters=k)

## Step 5: Fit and Predict

Next, we'll fit the K-means model to the data and predict the cluster labels for each data point.

In [4]:
# Fit and Predict
kmeans.fit(data)
cluster_labels = kmeans.predict(data)

## Step 6: Visualize Clusters

Finally, let's use Matplotlib to visualize the clustered data points and the centroids of the clusters.

In [5]:
# Visualize Clusters
plt.scatter(data[:, 0], data[:, 1], c=cluster_labels, cmap='rainbow')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], marker='x', s=200, c='black')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('K-means Clustering')
plt.show()