Skip to content

rcgc/MallCustomerSegmentationAndClassification

Repository files navigation

Mall Customer Segmentation And Classification

Abstract

Stores usually launch offers, but how do they know what kind of offers could we need?

Fortunately they're not spying us when we are talking or thinking about our necessities (or maybe yes). However, one of the many roads to discover deeper relationships between customer necessities and increasing store sales is by finding patterns in data, and that's what Machine Learning accomplishes perfectly.

Unsupervised learning

This Machine Learning subarea refers to the use of Artificial Intelligence algorithms to detect patterns in data sets containing data points that are neither classified nor labeled.

Solution

In order to finding groups of customers with common characteristics such as salary range or spending score proximity I will use a clustering algorithm known as k-means. This partitional algorithm is good at dividing groups of shuffled points and will help us to solve the key problem: separating data points into groups.

After that, I'll use a decision tree which implements the CART algorithm in order to finding specific boundaries to classify customers according to labels generated by the k-means algorithm.

K-means

Original dataset scatter graph

Image 1. Original dataset scatter graph[1]



Elbow method graph

Image 2. Elbow method graph



Clusters graph

Image 3. Clusters graph


Interpretation of clusters by customer group

label Annual Income (k$) Spending score (1-100)
0🟢 low low
1🟡 high high
2🔴 medium medium
3🟣 low high
4🔵 high low

Decision tree

Decision_tree graph

Image 4. Decision tree diagram


Value it's an array of label values from 0-4. So according to image 3, it can be undestood in the following way:

value = [ label0🟢, label1🟡, label2🔴, label3🟣, label4🔵 ]

How much is low, medium or high?

label Annual Income (k$) Spending score (1-100)
0🟢 <= 38.5 <= 50.0
1🟡 > 68.5 > 51.5
2🔴 38.5 - 68.5 -
3🟣 <= 38.5 > 50.0
4🔵 > 68.5 <= 51.5

Segmentation

Image 5. Final segmentation

Requirements

How to use it

Run the .py files in the following order:

  • python kmeans.py : will plot the original data, elbow diagram, kmeans clusters and will generate customer_segmentation.csv
  • python dt2.py : will plot decision tree generated by checking customer_segmentation.csv (using scikit)
  • python decision_tree.py : will display in console decision tree generated by checking customer_segmentation.csv (coded from scratch)

References

[1]"Mall Customer Segmentation Data", Kaggle.com, 2021. [Online]. Available: https://www.kaggle.com/vjchoudhary7/customer-segmentation-tutorial-in-python?select=Mall_Customers.csv. [Accessed: 04- Nov- 2021].
[2]"1.10. Decision Trees", scikit-learn, 2021. [Online]. Available: https://scikit-learn.org/stable/modules/tree.html. [Accessed: 04- Nov- 2021].
[3]Gordon, J., 2021. Let’s Write a Decision Tree Classifier from Scratch - Machine Learning Recipes #8. [online] Youtube.com. Available at: https://www.youtube.com/watch?v=LDRbO9a6XPU&list=PLOU2XLYxmsIIuiBfYad6rFYQU_jL2ryal&index=9 [Accessed 4 November 2021].