Skip to content

salahhesham01/Performance-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Performance Analysis & Churn Prediction

A complete end-to-end data science project analysing marketplace performance from 2018 to 2021, covering sales trends, product performance, RFM-based customer segmentation, and a churn prediction model — built on real transactional data.


Project Overview

This project analyses marketplace data across three dimensions: executive sales performance, product & operations, and customer behaviour. RFM (Recency, Frequency, Monetary) analysis was used to segment customers into 10 behavioural groups, and a Logistic Regression model was built to predict customer churn.

The project has 3 layers:

  • Power BI Dashboard — 3-page interactive business intelligence report
  • Business Report — written insights and recommendations for stakeholders
  • ML Model — Logistic Regression churn prediction model (89.9% accuracy)

Dashboard Preview

Executive Overview

Overview

Customer Insights

Customer

Product & Operations

Products

Key Business Metrics

Metric Value
Total Revenue $2,297,201
Total Profit $286K
Total Orders 5K
Total Quantity 38K
Active Customers 436
Average Order Value (AOV) $459
Customer LTV $3,000
Retention Rate (2020→2021) 87%
Churn Rate 13%
New Customers 11

Key Insights

Sales & Regional Performance

  • West region leads sales at $725K, while South lags at $391K — a significant regional imbalance
  • Sales grew consistently year over year: $484K (2018) → $471K (2019) → $609K (2020) → $733K (2021)
  • Standard Class is the dominant shipping mode at 59.77%; Same Day delivery is underutilised at 5.27%

Product Performance

  • Technology leads all categories at $836K, followed by Furniture ($742K) and Office Supplies ($719K)
  • Phones ($330K) and Chairs ($328K) are the top sub-categories
  • Canon imageCLASS 2200 Advanced Copier is the top-selling product at $62K

Customer Segmentation (RFM Analysis)

  • Hibernating is the largest segment (171 customers) — high churn risk
  • Can't Lose customers (39) have the highest LTV ($4.3K) but are becoming inactive — critical to recover
  • At Risk (106 customers) are historical high-spenders who have stopped purchasing
  • About to Sleep segment has the highest AOV ($622) — their churn is the most expensive

RFM Customer Segments

Segment Count Avg LTV AOV Priority
Hibernating 171 Low $453 Medium
Loyal Customers 149 $3.9K $447 Retain
Potential Loyalists 112 $2.6K $471 Nurture
At Risk 106 $3.1K $472 🔴 High
Champions 88 $3.9K $434 Reward
About to Sleep 57 $2.5K $622 🔴 High
Can't Lose 39 $4.3K 🔴 Critical
Need Attention 32 $2.9K $455 Medium
New Customers 22 Onboard
Promising 17 $564 Develop

Machine Learning Model — Churn Prediction

Goal

Predict whether a customer will churn (not return in the current year after purchasing in the previous year).

Features Used

RFM-based behavioural features were selected as proven predictors of customer loyalty:

  • recency — days since last purchase
  • frequency — number of orders
  • monetary — total spend
  • is_active — purchased in last 90 days
  • is_repeat — made more than one purchase
  • is_new — first purchase was recent

Model Results

Metric Score
Model Logistic Regression
Accuracy 89.9%
Evaluation Focus Recall, F1-Score, ROC-AUC
Problem Type Binary Classification

Why Logistic Regression?

  • Highly interpretable — easy to explain to business stakeholders
  • Stable on small-to-medium customer datasets
  • Outputs churn probability per customer, not just a binary flag — enabling priority scoring

Business Recommendations

Churn Reduction & Retention:

  1. Target "Can't Lose" & "At Risk" segments — 145 customers with historically high LTV. Offer personalised loyalty rewards or early access to Technology products rather than generic discounts
  2. "About to Sleep" prevention — highest AOV segment ($622). Launch automated "Price Drop" or "New Arrival" alerts for Technology products before they move to Hibernating
  3. Automated win-back — for Hibernating customers, trigger automated email sequences at the 120-day mark post-purchase

Revenue & Profitability Growth: 4. Lead with Technology — allocate 60% of marketing budget to Technology (specifically Phones) as it has the highest volume 5. Bundle for AOV — Chairs and Tables combined ($535K) are natural candidates for "Complete Office" bundles 6. Cross-sell accessories — use high-traffic Technology category to cross-sell higher-margin Office Supplies at checkout

Operational & Regional Improvement: 7. South region expansion — investigate underperformance ($391K vs $725K in West). Consider localised marketing or reduced shipping rates 8. Promote premium shipping — increase visibility of Same Day and First Class at checkout for Champions willing to pay for speed


Tools & Technologies

Category Tools
Dashboard & BI Power BI, DAX
Programming Python
ML Libraries scikit-learn
Data Processing pandas, numpy
Customer Analytics RFM Analysis
ML Model Logistic Regression
Environment Google Colab
Version Control Git, GitHub


📈 Dataset

  • Source: Marketplace transactions Kaggle
  • Period: 2018 – 2021
  • Sales data: Orders, products, regions, shipping modes, categories
  • Customer data: RFM scores, segments, churn labels, activity flags

👤 Author

Salah Hesham

LinkedIn GitHub


About

marketplace performance analysis: RFM customer segmentation, Power BI dashboard, and churn prediction model with 89.9% accuracy

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors