Skip to content

This folder details unsupervised learning learnings using k means

Notifications You must be signed in to change notification settings

moyinajayi/Retail_customer_Clustering

Repository files navigation

Retail Customer Behaviour Analysis

This project Details the analysis conducted to Identify customer behaviour in an ecommerce data set. The project showcases the adoption of Recency, frequency and spend in understanding customer behaviour in the data set

Customer segementation was also identified using KMeans Clustering algorithm.

Customer Behaviour analysis and segementation using KMeans.

Project description: This is a data analysis project that involves reading, cleaning, exploring, and performing advanced analyses on a retail dataset. The project uses Kmeans Clustering algorithm for identifying .

This project covers data cleaning, exploratory data analysis (EDA), data visualization, and customer segmentation using machine learning techniques.

1. Importing libraries and Loading the data

I used a dataset named OnlineRetail.csv, which contains transactional data from an online retail store. The primary goal was to explore the dataset, clean it, visualize important patterns, and perform customer segmentation image\ image

2. Data Visualization

After removing the nulll values from the dataset, I created two addditonal columsn 'Month' and 'Day of Week' based of tdh invoice date column. The goal was to further understand if there were any pointers to the actual day of week, or some other insights from the date Below are some visualizatiions showing trend and distributions.

image

The Barplots show that there were no transactions on saturday, and even though the highest transactions were in the 11th month, the total amount spent was highest in the first month in January

image

In addition this visuals show for teh top 5 countries and the tiop stock items purchased image

3.Analysis using Recency Frequency and Spend (RFS)

Recency, Frequency, Monetary model (RFM), is a behavior based analysis technique used to segment customers by examining their transaction history. Recency is calculated as the number of days since the last purchase Recency is calculated, frequency is the number of transactions per customer, and Spend is the total amount spent per customer.

last_transaction_date = df.groupby('CustomerID')['InvoiceDate'].max()
reference_date = max(df['InvoiceDate'])
days_difference = (reference_date - last_transaction_date).dt.days
days_difference = days_difference.reset_index().rename(columns={'InvoiceDate': 'recency'})

These 2 metrics are then merged into a single dataframe.

image

And these are boxplots to show the Outliers in the RFS data distributions

image

The outliers were mostly in the Frequency and spend coluumns and were removed before applying Kmeans algorthm

4. Feature scaling

The outliers were removed in X, and then feature scaling

from sklearn.preprocessing import StandardScaler
X=rfs.iloc[:,1:]
scaler = StandardScaler()
X = scaler.fit_transform(X)

4. Clustering customers using KMeans

I adopted the Yellowbirck cluster for visualizing the Kmeans distorion score Elbow. As shown below, the k elbows at 4, indicating 4 clusters. image

Then I fitted the Kmeans and then updated the RFS dataframe with teh clusters identified.

kmeans= KMeans(n_clusters=4,n_init='auto',random_state=42)
kmeans.fit(X)

4. Results and findings : recency and Spend

image

image

These are the four clusters identified from the results

Cluster 0: High recency, low frequency , low Spend
Cluster 1: Low recency, High Frequency, moderate Spend
Cluster 2: Low Recency, Low Frequency, Low Spend
Cluster 3: Low Recency, High Frequency, High Spend

In addition, these are the top items by Amount spent.

image

For more details see Repository

About

This folder details unsupervised learning learnings using k means

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published