# Customer Segmentation
is a popular application of unsupervised learning. Using
clustering, identify segments of customers to target the potential user base. They divide
customers into groups according to common characteristics like gender, age, interests,
and spending habits so they can market to each group effectively.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
df=pd.read_csv("../input/customer-segmentation-tutorial-in-python/Mall_Customers.csv")


In [None]:
df

In [None]:
data1=df[["Gender", "Annual Income (k$)"]].groupby(['Gender'], as_index=False).mean().sort_values(by='Annual Income (k$)', ascending=False)
sns.barplot(x='Gender', y='Annual Income (k$)', data=data1)

In [None]:
data2=df[["Gender", "Spending Score (1-100)"]].groupby(['Gender'], as_index=False).mean().sort_values(by='Spending Score (1-100)', ascending=False)
sns.barplot(x="Gender",y="Spending Score (1-100)",data=data2)

In [None]:
data3=df[["Age", "Spending Score (1-100)"]].groupby(['Age'], as_index=False).sum().\
                sort_values(by='Spending Score (1-100)', ascending=False)
plt.figure(figsize=[16,6])
sns.barplot(x="Age",y="Spending Score (1-100)",data=data3,orient="v")

In [None]:
data4=df[["Age", "Annual Income (k$)"]].groupby(['Age'], as_index=False).sum().\
                sort_values(by='Annual Income (k$)', ascending=False)
plt.figure(figsize=[16,6])
sns.barplot(x="Age",y="Annual Income (k$)",data=data4)

In [None]:
sns.scatterplot(x="Age",y="Annual Income (k$)",data=df)

In [None]:
table1 = pd.pivot_table(df, values='Spending Score (1-100)', index=['Age'],
                    columns=['Gender'], aggfunc=np.mean,fill_value=0)
table1.columns=["Female","Male"]
table1 = table1.reset_index()

In [None]:
table1.describe()

In [None]:
sns.distplot(table1["Female"])

In [None]:
plt.figure(figsize=[15,8])
sns.pointplot(x="Age",y="Female",data=table1,color='red')
sns.pointplot(x="Age",y="Male",data=table1)
plt.xlabel('Age')
plt.ylabel('Score')

In [None]:
X = df.iloc[:, [3, 4]].values

In [None]:
fig = plt.figure(figsize = (10,5))
plt.scatter(X[:,0],X[:,1],s=100,c='magenta',label='All customers')
plt.title('Clients before clustering')
plt.xlabel('Annual income $')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()

In [None]:
from sklearn.cluster import KMeans

history_inertia = list()
columns_cluster = [ [ "Spending Score (1-100)","Annual Income (k$)"]]
for i in range(2, 15):
    model = KMeans(n_clusters=i, random_state=10, n_jobs=-1)
    model.fit(df[columns_cluster[0]])
    history_inertia.append(model.inertia_)

plt.figure(figsize=(10,7))
plt.plot(np.arange(2,15), history_inertia)
plt.show()

In [None]:
plt.figure(figsize=(15,10))
model = KMeans(n_clusters=5, random_state=10)
y_hat = model.fit_predict(df[columns_cluster[0]])

labels = ["Carefull", "Standard", "Wise",
          "Sensible", "Target"]

plt.scatter(df[y_hat == 0][columns_cluster[0][0]], df[y_hat == 0][columns_cluster[0][1]],c='red',label=labels[0])
plt.scatter(df[y_hat == 1][columns_cluster[0][0]], df[y_hat == 1][columns_cluster[0][1]],c='blue',label=labels[3])
plt.scatter(df[y_hat == 2][columns_cluster[0][0]], df[y_hat == 2][columns_cluster[0][1]],c='cyan',label=labels[2])
plt.scatter(df[y_hat == 3][columns_cluster[0][0]], df[y_hat == 3][columns_cluster[0][1]],c='green',label=labels[4])
plt.scatter(df[y_hat == 4][columns_cluster[0][0]], df[y_hat == 4][columns_cluster[0][1]],c='magenta',label=labels[1])
plt.legend()
plt.grid()
plt.title('Cluster of Clients')
plt.xlabel(columns_cluster[0][1])
plt.ylabel(columns_cluster[0][0])

In [None]:
We can see that the mall customers can be broadly grouped into 5 groups based on their purchases made in the mall.
In cluster(Cyan colored) we can see people have low annual income and low spending scores, 
this is quite reasonable as people having low salaries prefer to buy less, in fact, 
these are the wise people who know how to spend and save money.
The shops/mall will be least interested in people belonging to this cluster.

In cluster(Magenta colored) we can see that people have low income but higher spending scores, 
these are those people who for some reason love to buy products more often even though they have a low income.
Maybe it’s because these people are more than satisfied with the mall services.
The shops/malls might not target these people that effectively but still will not lose them.

In cluster(red colored) we see that people have average income and an average spending score, 
these people again will not be the prime targets of the shops or mall, 
but again they will be considered and other data analysis techniques may be used to increase their spending score.

In cluster(Blue-colored) we see that people have high income and high spending scores,
this is the ideal case for the mall or shops as these people are the prime sources of profit. 
These people might be the regular customers of the mall and are convinced by the mall’s facilities.

In cluster(green colored) we see that people have high income but low spending scores,
this is interesting. Maybe these are the people who are unsatisfied or unhappy by the mall’s services. 
These can be the prime targets of the mall, as they have the potential to spend money. 
So, the mall authorities will try to add new facilities so that they can attract these people and can meet their needs.