A complete end-to-end data science project analysing marketplace performance from 2018 to 2021, covering sales trends, product performance, RFM-based customer segmentation, and a churn prediction model — built on real transactional data.
This project analyses marketplace data across three dimensions: executive sales performance, product & operations, and customer behaviour. RFM (Recency, Frequency, Monetary) analysis was used to segment customers into 10 behavioural groups, and a Logistic Regression model was built to predict customer churn.
The project has 3 layers:
- Power BI Dashboard — 3-page interactive business intelligence report
- Business Report — written insights and recommendations for stakeholders
- ML Model — Logistic Regression churn prediction model (89.9% accuracy)
| Metric | Value |
|---|---|
| Total Revenue | $2,297,201 |
| Total Profit | $286K |
| Total Orders | 5K |
| Total Quantity | 38K |
| Active Customers | 436 |
| Average Order Value (AOV) | $459 |
| Customer LTV | $3,000 |
| Retention Rate (2020→2021) | 87% |
| Churn Rate | 13% |
| New Customers | 11 |
- West region leads sales at $725K, while South lags at $391K — a significant regional imbalance
- Sales grew consistently year over year: $484K (2018) → $471K (2019) → $609K (2020) → $733K (2021)
- Standard Class is the dominant shipping mode at 59.77%; Same Day delivery is underutilised at 5.27%
- Technology leads all categories at $836K, followed by Furniture ($742K) and Office Supplies ($719K)
- Phones ($330K) and Chairs ($328K) are the top sub-categories
- Canon imageCLASS 2200 Advanced Copier is the top-selling product at $62K
- Hibernating is the largest segment (171 customers) — high churn risk
- Can't Lose customers (39) have the highest LTV ($4.3K) but are becoming inactive — critical to recover
- At Risk (106 customers) are historical high-spenders who have stopped purchasing
- About to Sleep segment has the highest AOV ($622) — their churn is the most expensive
| Segment | Count | Avg LTV | AOV | Priority |
|---|---|---|---|---|
| Hibernating | 171 | Low | $453 | Medium |
| Loyal Customers | 149 | $3.9K | $447 | Retain |
| Potential Loyalists | 112 | $2.6K | $471 | Nurture |
| At Risk | 106 | $3.1K | $472 | 🔴 High |
| Champions | 88 | $3.9K | $434 | Reward |
| About to Sleep | 57 | $2.5K | $622 | 🔴 High |
| Can't Lose | 39 | $4.3K | — | 🔴 Critical |
| Need Attention | 32 | $2.9K | $455 | Medium |
| New Customers | 22 | — | — | Onboard |
| Promising | 17 | — | $564 | Develop |
Predict whether a customer will churn (not return in the current year after purchasing in the previous year).
RFM-based behavioural features were selected as proven predictors of customer loyalty:
recency— days since last purchasefrequency— number of ordersmonetary— total spendis_active— purchased in last 90 daysis_repeat— made more than one purchaseis_new— first purchase was recent
| Metric | Score |
|---|---|
| Model | Logistic Regression |
| Accuracy | 89.9% |
| Evaluation Focus | Recall, F1-Score, ROC-AUC |
| Problem Type | Binary Classification |
- Highly interpretable — easy to explain to business stakeholders
- Stable on small-to-medium customer datasets
- Outputs churn probability per customer, not just a binary flag — enabling priority scoring
Churn Reduction & Retention:
- Target "Can't Lose" & "At Risk" segments — 145 customers with historically high LTV. Offer personalised loyalty rewards or early access to Technology products rather than generic discounts
- "About to Sleep" prevention — highest AOV segment ($622). Launch automated "Price Drop" or "New Arrival" alerts for Technology products before they move to Hibernating
- Automated win-back — for Hibernating customers, trigger automated email sequences at the 120-day mark post-purchase
Revenue & Profitability Growth: 4. Lead with Technology — allocate 60% of marketing budget to Technology (specifically Phones) as it has the highest volume 5. Bundle for AOV — Chairs and Tables combined ($535K) are natural candidates for "Complete Office" bundles 6. Cross-sell accessories — use high-traffic Technology category to cross-sell higher-margin Office Supplies at checkout
Operational & Regional Improvement: 7. South region expansion — investigate underperformance ($391K vs $725K in West). Consider localised marketing or reduced shipping rates 8. Promote premium shipping — increase visibility of Same Day and First Class at checkout for Champions willing to pay for speed
| Category | Tools |
|---|---|
| Dashboard & BI | Power BI, DAX |
| Programming | Python |
| ML Libraries | scikit-learn |
| Data Processing | pandas, numpy |
| Customer Analytics | RFM Analysis |
| ML Model | Logistic Regression |
| Environment | Google Colab |
| Version Control | Git, GitHub |
- Source: Marketplace transactions Kaggle
- Period: 2018 – 2021
- Sales data: Orders, products, regions, shipping modes, categories
- Customer data: RFM scores, segments, churn labels, activity flags
Salah Hesham