### **Task -**

> The Mall customers dataset contains information about people visiting the mall. The dataset has gender,  customer id, age, annual income, and spending-score. It collects insights from the data and group customers based on their behaviors. Segment the customers based on the age, gender, interest. Customer segmentation is an important practise of dividing customers base into individual groups that are similar. It is useful in customised marketing.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans, AffinityPropagation, MeanShift, estimate_bandwidth
from mpl_toolkits.mplot3d import Axes3D

K = 15
col_names = ["customer_id","gender","age","annual_income","spending_score"]
data = pd.read_csv("../input/mall-customers/Mall_Customers.csv",names=col_names,header=0)
data = data.sample(frac=1)
data.head()

In [None]:
data.shape

#### _So, here we got four features among which only **three features** (`spending-score`, `age` and `gender`) are relevant to us._

----------------------------------------------

> **We will perform the following:**

1. Visualization of data
2. Visualize the elbow plot for finding the optimal number of clusters to be used in `k-means clustring`
3. Perform `Mean-shift clustering` on the data
4. Perform `Affinity Propagation` on the data

#### 1. With x-axis as **`gender`**

In [None]:
data = data.iloc[:,[1,2,4]]
data = data.replace(to_replace="Female",value=1)
data = data.replace(to_replace="Male",value=0)
data.head()




##### **Visualization** of data

In [None]:
fig = plt.figure(figsize=(14,8))
ax = plt.axes(projection="3d")

x = data["age"].values
z = data["gender"].values
y = data["spending_score"].values

img = ax.scatter(x, y, z, s=50)
ax.set_zticks([0, 1])
ax.set_xlabel('Age')
ax.set_ylabel('Spending Score')
ax.set_zlabel('Gender (F = 1, M = 0)')
plt.show()


##### **Elbow Plot** and **K-means**

1. Elbow Plot

In [None]:
inertia = []

for k in range(1,K):
  kmeans = KMeans(n_clusters=k,init = 'k-means++')
  kmeans.fit(data.values)
  inertia.append(kmeans.inertia_)

plt.plot(range(1,K), inertia)
plt.title('Elbow Plot')
plt.xlabel('Number of clusters')
plt.ylabel('Distance of points with centroid')
plt.show()

> *From the graph above we can find that `k = 4` is an optimal value*

In [None]:
print("From elbow plot we can find that optimal number of cluster will be 4")

2. K-Means Clustering

In [None]:
vals = data.values
k = 4 #from graph above

In [None]:
kmeans = KMeans(n_clusters = k, init = 'k-means++').fit(vals)

fig = plt.figure(figsize=(14,8))
ax = plt.axes(projection="3d")

x = data["age"].values
z = data["gender"].values
y = data["spending_score"].values

print(f"K-means: num of clusters - {k}")

ax.scatter3D(x, y, z, s=100, c=kmeans.labels_, edgecolors='b')
ax.scatter3D(kmeans.cluster_centers_[:, 1], kmeans.cluster_centers_[:, 2],kmeans.cluster_centers_[:, 0], s = 300, color = 'black', marker="P", edgecolors='red')

# fig.colorbar(img1)
ax.set_zticks([0, 1])
ax.set_xlabel('Age')
ax.set_ylabel('Spending Score')
ax.set_zlabel('Gender (F = 1, M = 0)')
plt.show()

##### **Affinity Propagation**

In [None]:
afprop = AffinityPropagation().fit(vals)
cluster_centers = afprop.cluster_centers_indices_
num_clusters = len(cluster_centers)
print(f"Affinity Propagation: num of clusters - {num_clusters}")

fig = plt.figure(figsize=(14,8))
ax = plt.axes(projection="3d")

x = data["age"].values
z = data["gender"].values
y = data["spending_score"].values

ax.scatter3D(x, y, z, s=100, c=afprop.labels_, edgecolors='b')
ax.scatter3D(afprop.cluster_centers_[:, 1], afprop.cluster_centers_[:, 2],afprop.cluster_centers_[:, 0], s = 300, color = 'black', marker="P", edgecolors='red')

# fig.colorbar(img1)
ax.set_zticks([0, 1])
ax.set_xlabel('Age')
ax.set_ylabel('Spending Score')
ax.set_zlabel('Gender (F = 1, M = 0)')
plt.show()

##### **Mean-shift Algorithm**

In [None]:
bandwidth = estimate_bandwidth(vals, quantile=0.2, n_samples=20)
ms = MeanShift(bandwidth=bandwidth, bin_seeding=True).fit(vals)
cluster_centers = ms.cluster_centers_
num_clusters = len(cluster_centers)

print(f"Mean-shift: num of clusters - {num_clusters}")

fig = plt.figure(figsize=(14,8))
ax = plt.axes(projection="3d")

x = data["age"].values
z = data["gender"].values
y = data["spending_score"].values

ax.scatter3D(x, y, z, s=100, c=ms.labels_, edgecolors='b')
ax.scatter3D(ms.cluster_centers_[:, 1], ms.cluster_centers_[:, 2],ms.cluster_centers_[:, 0], s = 300, color = 'black', marker="P", edgecolors='red')

# fig.colorbar(img1)
ax.set_zticks([0, 1])
ax.set_xlabel('Age')
ax.set_ylabel('Spending Score')
ax.set_zlabel('Gender (F = 1, M = 0)')
plt.show()