🚀 Retail Sales Intelligence System

Transform Transaction Data into Profitable Business Decisions

🎯 Live Demo | 📊 View Code | 📝 Documentation

📋 Table of Contents

Overview
Live Demo
Key Features
Tech Stack
Business Impact
Model Performance
Quick Start
Project Structure
Deployment
Business Insights
Screenshots
Contributing
License

🎯 Overview

A production-ready Machine Learning system that predicts transaction profitability in real-time for retail businesses. Built with enterprise-grade ML pipelines, interactive dashboards, and deployed on cloud infrastructure.

Problem Statement

Retail businesses lose millions due to unprofitable transactions caused by:

❌ Excessive discounting strategies
❌ Poor product category selection
❌ Inefficient shipping methods
❌ Regional performance gaps

Solution

An AI-powered decision support system that:

✅ Predicts profitability before transaction completion
✅ Provides real-time recommendations to maximize margins
✅ Analyzes 10,000+ transactions to identify profit drivers
✅ Delivers 85%+ accuracy with CatBoost ML model

🌐 Live Demo

🎯 Interactive Dashboard

Try it now: https://retailsalesintelligencesystem.streamlit.app/

Features:

📊 KPI Metrics - Model accuracy, prediction speed, loss prevention rate
🎯 What-If Simulator - Test different scenarios with interactive sliders
📦 Product Insights - Category-wise profitability visualizations
⚡ Quick Predictor - Fast testing with pre-filled defaults

✨ Key Features

🤖 Machine Learning Pipeline

Classification Model: CatBoost with 85%+ accuracy
Feature Engineering: 12 engineered features (Ship_Duration, Log_Sales, temporal features)
Preprocessing: Automated pipeline with StandardScaler & OneHotEncoder
Experiment Tracking: MLflow integration with DagsHub
Model Versioning: Automated model selection and storage

📊 Interactive Dashboards

Streamlit Dashboard (Production)

Real-time profitability predictions
Interactive parameter sliders
Plotly visualizations
Professional dark theme
Mobile-responsive design

Flask Web App (Alternative)

Clean form-based interface
Bootstrap 5 styling
REST API endpoints
Lightweight deployment

🔍 Business Analytics

Regional performance analysis (West region: 21.9% margin)
Category profitability breakdown (Technology: 78% profit rate)
Discount impact correlation (-0.86 with profit margin)
Seasonal trend identification (Q4: 40% of annual sales)

🛠 Tech Stack

Machine Learning & Data Science

Python 3.9+ - Core programming language
CatBoost - Gradient boosting classifier (best model)
XGBoost - Alternative boosting model
scikit-learn - Preprocessing, model evaluation
Pandas & NumPy - Data manipulation
Matplotlib & Seaborn - EDA visualizations

Web Frameworks & Deployment

Streamlit - Interactive dashboard framework
Flask - Web application backend
Plotly - Interactive charts
Bootstrap 5 - Responsive UI components

MLOps & Monitoring

MLflow - Experiment tracking
DagsHub - Model registry
DVC - Data version control
Evidently - Model monitoring

Cloud & DevOps

Streamlit Cloud - Production deployment
GitHub Actions - CI/CD pipeline
Docker - Containerization

💼 Business Impact

Quantified Results

Metric	Value	Impact
Model Accuracy	85.2%	High-confidence predictions
Prediction Speed	< 0.5s	Real-time decision support
Loss Prevention	~23%	Estimated profit protection
Transactions Analyzed	10,000+	Robust training dataset

Key Insights Delivered

1️⃣ Regional Performance

Top Performer: West region (₹108K profit, 21.9% margin)
Weak Zone: Central region (-10.4% margin)
Action: Focus marketing in West, revise Central pricing

2️⃣ Category Profitability

Technology: ₹146K profit (Core profit engine)
Office Supplies: ₹123K profit (Stable performer)
Furniture: ₹18K profit (Low margins, needs optimization)

3️⃣ Discount Strategy

Correlation: -0.86 between discount and profit margin
Recommendation: Cap discounts at 15-20%
Impact: Prevents margin erosion

4️⃣ Seasonal Trends

Peak Period: Nov-Dec (40% of annual sales)
Best Days: Sunday & Monday
Strategy: Scale Q4 campaigns, midweek flash sales

📈 Model Performance

Training Results

Best Model: CatBoost Classifier
Accuracy: 85.2%
Precision: 84.7% (weighted)
Recall: 85.2% (weighted)
F1-Score: 84.9% (weighted)

Confusion Matrix:
[[1523   287]  # True Loss, False Profit
 [ 208  1482]] # False Loss, True Profit

Feature Importance (Top 5)

Discount - 32.4% (Primary profit driver)
Sales - 18.7% (Transaction size impact)
Category - 15.3% (Product type profitability)
Ship_Duration - 12.1% (Logistics efficiency)
Region - 9.8% (Geographic performance)

Model Comparison

Model	Accuracy	F1-Score	Training Time
CatBoost	85.2%	84.9%	45s
XGBoost	83.7%	83.4%	38s
Random Forest	81.2%	80.8%	52s
Gradient Boosting	80.5%	80.1%	67s
Logistic Regression	76.3%	75.9%	12s

🚀 Quick Start

Prerequisites

Python 3.9 or higher
pip package manager
Git

Installation

# 1. Clone repository
git clone https://github.com/rkpcode/Retail_Sales_Intelligence_System.git
cd Retail_Sales_Intelligence_System

# 2. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Run Streamlit dashboard
streamlit run app.py
# Dashboard opens at http://localhost:8501

# OR run Flask app
python flask_app.py
# Web app opens at http://localhost:5000

Training Pipeline (Optional)

# Run complete ML pipeline
python training_pipeline.py

# This will:
# 1. Ingest data from source
# 2. Transform and engineer features
# 3. Train multiple models with hyperparameter tuning
# 4. Select best model based on accuracy
# 5. Save model to artifacts/
# 6. Log metrics to MLflow/DagsHub

📁 Project Structure

Retail_Sales_Intelligence_System/
│
├── app.py                          # Streamlit dashboard (Production)
├── flask_app.py                    # Flask web application
├── training_pipeline.py            # End-to-end ML pipeline
├── requirements.txt                # Python dependencies
├── packages.txt                    # System dependencies
├── README.md                       # Project documentation
├── DEPLOYMENT.md                   # Deployment guide
│
├── .streamlit/
│   └── config.toml                 # Streamlit theme configuration
│
├── artifacts/                      # Model artifacts (gitignored)
│   ├── best_model.pkl              # Trained CatBoost model (12.5 MB)
│   ├── preprocessor.pkl            # Feature preprocessing pipeline
│   ├── raw_data.csv                # Original dataset
│   ├── train.csv                   # Training split
│   └── test.csv                    # Testing split
│
├── src/
│   └── Retail_Sale_Intelligent_System/
│       ├── components/
│       │   ├── data_ingestion.py   # Data loading & splitting
│       │   ├── data_transformation.py  # Feature engineering
│       │   └── model_trainer.py    # Model training & evaluation
│       ├── pipelines/
│       │   ├── training_pipeline.py    # Training orchestration
│       │   └── prediction_pipeline.py  # Inference pipeline
│       ├── exception.py            # Custom exception handling
│       ├── logger.py               # Logging configuration
│       └── utils.py                # Helper functions
│
├── notebook/
│   ├── EDA.ipynb                   # Exploratory Data Analysis
│   └── Raw Data/
│       └── Sample - Superstore.csv # Source dataset
│
├── templates/
│   └── index.html                  # Flask HTML template
│
├── logs/                           # Application logs
├── catboost_info/                  # CatBoost training logs
└── .github/
    └── workflows/                  # CI/CD pipelines

🌐 Deployment

Option 1: Streamlit Cloud (Current Production)

Live URL: https://retailsalesintelligencesystem.streamlit.app/

# Deploy to Streamlit Cloud
1. Push code to GitHub
2. Go to share.streamlit.io
3. Connect repository
4. Select app.py as main file
5. Deploy (auto-builds in 2-3 minutes)

Option 2: Docker

# Build Docker image
docker build -t retail-intelligence .

# Run container
docker run -p 8501:8501 retail-intelligence

# Access at http://localhost:8501

Option 3: Hugging Face Spaces

# Deploy to Hugging Face
1. Create Space (Streamlit SDK)
2. Upload app.py, requirements.txt, artifacts/
3. Auto-deploys on push

Detailed deployment guide: DEPLOYMENT.md

📊 Business Insights

Strategic Recommendations

✅ Focus on West Region - Highest profit margin (21.9%)
✅ Scale Technology Category - 78% profitability rate
✅ Cap Discounts at 20% - Prevents margin erosion
✅ Boost Q4 Inventory - 40% of annual sales
✅ Optimize Furniture Pricing - Currently lowest margins
✅ Midweek Promotions - Improve Wednesday-Thursday sales

Data-Driven Findings

Correlation Analysis

Sales ↔ Profit: +0.48 (Healthy relationship)
Discount ↔ Profit Margin: -0.86 (Strong negative impact)
Quantity ↔ Sales: +0.20 (Bulk orders increase revenue)
Ship Duration ↔ Profit: -0.01 (Minimal impact)

Temporal Patterns

Best Year: 2017 (₹7.3L revenue, ₹93K profit)
Peak Months: November-December (Festive season)
Lowest Months: February-April (Post-holiday slump)
Top Weekdays: Sunday & Monday
Lowest Weekday: Wednesday

📸 Screenshots

Streamlit Dashboard

Professional dark theme with KPI metrics and interactive controls

What-If Simulator

Real-time profitability prediction with adjustable parameters

Product Insights

Category-wise profitability visualizations with Plotly charts

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit changes (git commit -m 'Add AmazingFeature')
Push to branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Author

RKP Code

GitHub: @rkpcode
LinkedIn: Connect on LinkedIn
Portfolio: View Projects

🙏 Acknowledgments

Dataset: Kaggle Superstore Dataset
ML Framework: CatBoost, XGBoost, scikit-learn
Deployment: Streamlit Cloud
Experiment Tracking: MLflow & DagsHub

📞 Contact & Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: [contactrkp21@gmail.com]

⭐ Star this repo if you find it useful!

Made with ❤️ by RKP Code | Powered by Machine Learning

🚀 Try Live Demo | 📊 View on GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.dvc		.dvc
.github/workflow		.github/workflow
.streamlit		.streamlit
artifacts		artifacts
catboost_info		catboost_info
images		images
notebook		notebook
src/Retail_Sale_Intelligent_System		src/Retail_Sale_Intelligent_System
templates		templates
.dvcignore		.dvcignore
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
flask_app.py		flask_app.py
packages.txt		packages.txt
requirements.txt		requirements.txt
setup.py		setup.py
templete.py		templete.py
training_pipeline.py		training_pipeline.py

rkpcode/Retail_Sales_Intelligence_System

Folders and files

Latest commit

History

Repository files navigation