# Unsupervised Learning | Clustering (K-Means) 

## Customers Segmentation

#### **Introduction:**
#### **In this project i will use unsupervised K-mean clustering algorithms to measure customers segmentation and prepare it to final supervised model** 

![](https://www.converted.in/blog/wp-content/uploads/2020/05/Customer-segmentation.png)

### Importing Libraries

In [56]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set()

### Loading & Inspecting Data

In [57]:
# Load the data & check what's inside
customers = pd.read_csv ('../input/customerssegmentation/customers-segmentation.csv')
customers.head()

In [58]:
customers.info()

In [59]:
customers.describe().round(1)

### Exploring Data

Explor the data by creating a scatter plot for the two variables using matplotlib

In [60]:
plt.scatter(customers['Satisfaction'], customers['Loyalty'])
plt.xlabel('Satisfaction')
plt.ylabel('Loyalty')

### Data Preprocessing

In [73]:
from sklearn import preprocessing

# Scale the inputs using 'preprocessing.scale()' which scales each variable (column in X) with respect to itself
x_scaled = preprocessing.scale(X)
x_scaled

### Model Training & Prediction

In [70]:
from sklearn.cluster import KMeans

In [71]:
X = customers.copy()

In [74]:
# Now selecting the number of the cluster i aimed in this case i choose 2
kmeans = KMeans(2)

In [75]:
# fit the data
kmeans.fit(X)

In [76]:
# Create a copy of the input data
clusters = X.copy()

In [78]:
# Take note of the predicted clusters
clusters['cluster_pred'] = kmeans.fit_predict(X)

In [81]:
# Plot the data using the longitude and the latitude
plt.scatter(clusters['Satisfaction'], clusters['Loyalty'], c=clusters['cluster_pred'], cmap='coolwarm')
plt.xlabel('Satisfaction')
plt.ylabel('Loyalty')

### Finding the Optimal Number of Clusters (Elbow / Knee Method) 

**by using logic loop**

In [82]:
# First we defined (Within-Cluster Sum of Square) as wcss and make the empty list
wcss = []

# Create all possible cluster solution with a loop
# I have chosen to get solutions from 1 to 10 clusters
for i in range(1,11):
    # Cluster solution with i clusters
    kmeans = KMeans(i)
    #Fit the standardized data
    kmeans.fit(x_scaled)
    # Append the WCSS for the iteration
    wcss.append(kmeans.inertia_)

# Check the result
wcss

In [83]:
# Plot the number of clusters vs. WCSS
plt.plot(range(1,11),wcss)
plt.xlabel('Number of the cluster')
plt.ylabel('WCSS')

**The figure show that the optimal number are (5)**

In [85]:
# Fiddle with K (the number of clusters)
kmeans_new = KMeans(5)

#fit the data
kmeans_new.fit(x_scaled)

# Create a new data frame with the predicted clusters
clusters_new = X.copy()
clusters_new['cluster_pred'] = kmeans_new.fit_predict(x_scaled)

In [86]:
# Check if everything going well
clusters_new.head()

In [92]:
# PLotting the final clusters
plt.scatter(clusters_new['Satisfaction'], clusters_new['Loyalty'], c=clusters_new['cluster_pred'], cmap='rainbow')
plt.scatter(kmeans_new.cluster_centers_[:,0])
plt.xlabel('Satisfactioln')
plt.ylabel('Loyalty')

==========

# GOOD LUCK!