# Mall Customer Segmentation

In this kernel we will be segmenting the customers that come to a supermall and we will analyse their shopping behaviour. We will analyse their behaviour based on following parameters.

1. Age: *Age of the customer*
2. Gender: *Gender of the customer*
3. Annual Income: *Annual Income of the customer*
4. Spending Score: *Spending Score will be assigned by the supermall based on the defined parameters*

In [None]:
# importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# getting the dataset
df = pd.read_csv('../input/Mall_Customers.csv')

In [None]:
df.head() # first look at the dataset

In [None]:
df.shape # shape of dataset

In [None]:
df.isna().sum() # finding the null elements in the dataset


Now, we have got our dataset so we will now do some basic data visualisations to get some more insights of data.

In [None]:
sns.countplot(data=df, x="Gender")

In [None]:
_ = plt.hist(data=df, x='Age', bins=[10, 20, 30, 40, 50, 60, 70, 80], color=['green'])
_ = plt.xlabel("Age")

In [None]:
_ = plt.hist(data=df, x='Annual Income (k$)', bins=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140], color=['black'])
_ = plt.xlabel("Annual Income (k$)")

Now since we have seen some basic visualisations about the data we can start to build our model to segment the customers according to the parameters in the dataset

In [None]:
X = df.drop(columns=['CustomerID', 'Gender', 'Annual Income (k$)'])

In [None]:
from sklearn.cluster import KMeans

In [None]:
kmeans = KMeans(n_clusters=4, init='k-means++', n_init=10, max_iter=300, tol=0.0001, precompute_distances='auto', verbose=0, random_state=None, copy_x=True, n_jobs=None, algorithm='auto').fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

In [None]:
x = df["Age"]
y = df["Spending Score (1-100)"]

plt.scatter(x, y, c=labels)
plt.scatter(centroids[:, 0], centroids[:, 1], color='red')
_ = plt.xlabel('Age')
_ = plt.ylabel('Spending Score (1-100)')

In [None]:
X_2 = df.drop(columns=['CustomerID', 'Gender', 'Age'])

In [None]:
kmeans_2 = KMeans(n_clusters=4, init='k-means++', n_init=10, max_iter=300, tol=0.0001, precompute_distances='auto', verbose=0, random_state=None, copy_x=True, n_jobs=None, algorithm='auto').fit(X_2)
labels_2 = kmeans_2.labels_
centroids_2 = kmeans_2.cluster_centers_

In [None]:
x = df['Annual Income (k$)']
y = df['Spending Score (1-100)']

plt.scatter(x, y, c=labels_2)
plt.scatter(centroids_2[:, 0], centroids_2[:, 1], color='red')
_ = plt.xlabel('Annual Income (k$)')
_ = plt.ylabel('Spending Score (1-100)')