# Cluster Analysis on Market Segmentation

The example works on a data frame in the context of market segmentation. Data is available on 30 observations with customers regarding the two variables 1) satisfaction and 2) loyalty. Loyalty was measured by a standardized index ranging from -2.5 to +2.5. Satisfaction was measured by a 10 point Likert Scale.

## Import the relevant libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.cluster import KMeans

## Load the data

In [None]:
# Local: Fetch the file
# data = pd.read_csv ('market-segment-data.csv') # Local, use full path if notebook and file in different folders! 

# Cloud: Fetch the file
my_file = project.get_file('market-segment-data.csv')

# Cloud: Read the CSV data file from the object storage into a pandas DataFrame
my_file.seek(0)
data = pd.read_csv(my_file)

In [None]:
data

In [None]:
data.describe()

## Plot the data

Preliminary plot of the data with the two features on the axes

In [None]:
plt.scatter(data['Satisfaction'],data['Loyalty'])
plt.xlabel('Satisfaction')
plt.ylabel('Loyalty')

## Select the features

In [None]:
x = data.copy()

## Define the Model for Clustering

In [None]:
kmeans = KMeans(2)
kmeans.fit(x)

## Predict Clustering Results

In [None]:
clusters = x.copy()
clusters['cluster_pred']=kmeans.fit_predict(x)
clusters

In [None]:
plt.scatter(clusters['Satisfaction'],clusters['Loyalty'],c=clusters['cluster_pred'],cmap='rainbow')
plt.xlabel('Satisfaction')
plt.ylabel('Loyalty')

## Standardize the variables

Let's standardize and check the new result

In [None]:
from sklearn import preprocessing
x_scaled = preprocessing.scale(x)
x_scaled

## Apply the Elbow method

In [None]:
wcss =[]

for i in range(1,10):
    kmeans = KMeans(i)
    kmeans.fit(x_scaled)
    wcss.append(kmeans.inertia_)
    
wcss

In [None]:
plt.plot(range(1,10),wcss)
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')

## Explore clustering solutions and select the number of clusters

In [None]:
kmeans_new = KMeans(5)
kmeans_new.fit(x_scaled)
clusters_new = x.copy()
clusters_new['cluster_pred'] = kmeans_new.fit_predict(x_scaled)

In [None]:
clusters_new

In [None]:
plt.scatter(clusters_new['Satisfaction'],clusters_new['Loyalty'],c=clusters_new['cluster_pred'],cmap='rainbow')
plt.xlabel('Satisfaction')
plt.ylabel('Loyalty')