Transform Transaction Data into Profitable Business Decisions
- Overview
- Live Demo
- Key Features
- Tech Stack
- Business Impact
- Model Performance
- Quick Start
- Project Structure
- Deployment
- Business Insights
- Screenshots
- Contributing
- License
A production-ready Machine Learning system that predicts transaction profitability in real-time for retail businesses. Built with enterprise-grade ML pipelines, interactive dashboards, and deployed on cloud infrastructure.
Retail businesses lose millions due to unprofitable transactions caused by:
- ❌ Excessive discounting strategies
- ❌ Poor product category selection
- ❌ Inefficient shipping methods
- ❌ Regional performance gaps
An AI-powered decision support system that:
- ✅ Predicts profitability before transaction completion
- ✅ Provides real-time recommendations to maximize margins
- ✅ Analyzes 10,000+ transactions to identify profit drivers
- ✅ Delivers 85%+ accuracy with CatBoost ML model
Try it now: https://retailsalesintelligencesystem.streamlit.app/
Features:
- 📊 KPI Metrics - Model accuracy, prediction speed, loss prevention rate
- 🎯 What-If Simulator - Test different scenarios with interactive sliders
- 📦 Product Insights - Category-wise profitability visualizations
- ⚡ Quick Predictor - Fast testing with pre-filled defaults
- Classification Model: CatBoost with 85%+ accuracy
- Feature Engineering: 12 engineered features (Ship_Duration, Log_Sales, temporal features)
- Preprocessing: Automated pipeline with StandardScaler & OneHotEncoder
- Experiment Tracking: MLflow integration with DagsHub
- Model Versioning: Automated model selection and storage
- Real-time profitability predictions
- Interactive parameter sliders
- Plotly visualizations
- Professional dark theme
- Mobile-responsive design
- Clean form-based interface
- Bootstrap 5 styling
- REST API endpoints
- Lightweight deployment
- Regional performance analysis (West region: 21.9% margin)
- Category profitability breakdown (Technology: 78% profit rate)
- Discount impact correlation (-0.86 with profit margin)
- Seasonal trend identification (Q4: 40% of annual sales)
- Python 3.9+ - Core programming language
- CatBoost - Gradient boosting classifier (best model)
- XGBoost - Alternative boosting model
- scikit-learn - Preprocessing, model evaluation
- Pandas & NumPy - Data manipulation
- Matplotlib & Seaborn - EDA visualizations
- Streamlit - Interactive dashboard framework
- Flask - Web application backend
- Plotly - Interactive charts
- Bootstrap 5 - Responsive UI components
- MLflow - Experiment tracking
- DagsHub - Model registry
- DVC - Data version control
- Evidently - Model monitoring
- Streamlit Cloud - Production deployment
- GitHub Actions - CI/CD pipeline
- Docker - Containerization
| Metric | Value | Impact |
|---|---|---|
| Model Accuracy | 85.2% | High-confidence predictions |
| Prediction Speed | < 0.5s | Real-time decision support |
| Loss Prevention | ~23% | Estimated profit protection |
| Transactions Analyzed | 10,000+ | Robust training dataset |
- Top Performer: West region (₹108K profit, 21.9% margin)
- Weak Zone: Central region (-10.4% margin)
- Action: Focus marketing in West, revise Central pricing
- Technology: ₹146K profit (Core profit engine)
- Office Supplies: ₹123K profit (Stable performer)
- Furniture: ₹18K profit (Low margins, needs optimization)
- Correlation: -0.86 between discount and profit margin
- Recommendation: Cap discounts at 15-20%
- Impact: Prevents margin erosion
- Peak Period: Nov-Dec (40% of annual sales)
- Best Days: Sunday & Monday
- Strategy: Scale Q4 campaigns, midweek flash sales
Best Model: CatBoost Classifier
Accuracy: 85.2%
Precision: 84.7% (weighted)
Recall: 85.2% (weighted)
F1-Score: 84.9% (weighted)
Confusion Matrix:
[[1523 287] # True Loss, False Profit
[ 208 1482]] # False Loss, True Profit
- Discount - 32.4% (Primary profit driver)
- Sales - 18.7% (Transaction size impact)
- Category - 15.3% (Product type profitability)
- Ship_Duration - 12.1% (Logistics efficiency)
- Region - 9.8% (Geographic performance)
| Model | Accuracy | F1-Score | Training Time |
|---|---|---|---|
| CatBoost | 85.2% | 84.9% | 45s |
| XGBoost | 83.7% | 83.4% | 38s |
| Random Forest | 81.2% | 80.8% | 52s |
| Gradient Boosting | 80.5% | 80.1% | 67s |
| Logistic Regression | 76.3% | 75.9% | 12s |
- Python 3.9 or higher
- pip package manager
- Git
# 1. Clone repository
git clone https://github.com/rkpcode/Retail_Sales_Intelligence_System.git
cd Retail_Sales_Intelligence_System
# 2. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Run Streamlit dashboard
streamlit run app.py
# Dashboard opens at http://localhost:8501
# OR run Flask app
python flask_app.py
# Web app opens at http://localhost:5000# Run complete ML pipeline
python training_pipeline.py
# This will:
# 1. Ingest data from source
# 2. Transform and engineer features
# 3. Train multiple models with hyperparameter tuning
# 4. Select best model based on accuracy
# 5. Save model to artifacts/
# 6. Log metrics to MLflow/DagsHubRetail_Sales_Intelligence_System/
│
├── app.py # Streamlit dashboard (Production)
├── flask_app.py # Flask web application
├── training_pipeline.py # End-to-end ML pipeline
├── requirements.txt # Python dependencies
├── packages.txt # System dependencies
├── README.md # Project documentation
├── DEPLOYMENT.md # Deployment guide
│
├── .streamlit/
│ └── config.toml # Streamlit theme configuration
│
├── artifacts/ # Model artifacts (gitignored)
│ ├── best_model.pkl # Trained CatBoost model (12.5 MB)
│ ├── preprocessor.pkl # Feature preprocessing pipeline
│ ├── raw_data.csv # Original dataset
│ ├── train.csv # Training split
│ └── test.csv # Testing split
│
├── src/
│ └── Retail_Sale_Intelligent_System/
│ ├── components/
│ │ ├── data_ingestion.py # Data loading & splitting
│ │ ├── data_transformation.py # Feature engineering
│ │ └── model_trainer.py # Model training & evaluation
│ ├── pipelines/
│ │ ├── training_pipeline.py # Training orchestration
│ │ └── prediction_pipeline.py # Inference pipeline
│ ├── exception.py # Custom exception handling
│ ├── logger.py # Logging configuration
│ └── utils.py # Helper functions
│
├── notebook/
│ ├── EDA.ipynb # Exploratory Data Analysis
│ └── Raw Data/
│ └── Sample - Superstore.csv # Source dataset
│
├── templates/
│ └── index.html # Flask HTML template
│
├── logs/ # Application logs
├── catboost_info/ # CatBoost training logs
└── .github/
└── workflows/ # CI/CD pipelines
Live URL: https://retailsalesintelligencesystem.streamlit.app/
# Deploy to Streamlit Cloud
1. Push code to GitHub
2. Go to share.streamlit.io
3. Connect repository
4. Select app.py as main file
5. Deploy (auto-builds in 2-3 minutes)# Build Docker image
docker build -t retail-intelligence .
# Run container
docker run -p 8501:8501 retail-intelligence
# Access at http://localhost:8501# Deploy to Hugging Face
1. Create Space (Streamlit SDK)
2. Upload app.py, requirements.txt, artifacts/
3. Auto-deploys on pushDetailed deployment guide: DEPLOYMENT.md
✅ Focus on West Region - Highest profit margin (21.9%)
✅ Scale Technology Category - 78% profitability rate
✅ Cap Discounts at 20% - Prevents margin erosion
✅ Boost Q4 Inventory - 40% of annual sales
✅ Optimize Furniture Pricing - Currently lowest margins
✅ Midweek Promotions - Improve Wednesday-Thursday sales
- Sales ↔ Profit: +0.48 (Healthy relationship)
- Discount ↔ Profit Margin: -0.86 (Strong negative impact)
- Quantity ↔ Sales: +0.20 (Bulk orders increase revenue)
- Ship Duration ↔ Profit: -0.01 (Minimal impact)
- Best Year: 2017 (₹7.3L revenue, ₹93K profit)
- Peak Months: November-December (Festive season)
- Lowest Months: February-April (Post-holiday slump)
- Top Weekdays: Sunday & Monday
- Lowest Weekday: Wednesday
Professional dark theme with KPI metrics and interactive controls
Real-time profitability prediction with adjustable parameters
Category-wise profitability visualizations with Plotly charts
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add AmazingFeature') - Push to branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
RKP Code
- GitHub: @rkpcode
- LinkedIn: Connect on LinkedIn
- Portfolio: View Projects
- Dataset: Kaggle Superstore Dataset
- ML Framework: CatBoost, XGBoost, scikit-learn
- Deployment: Streamlit Cloud
- Experiment Tracking: MLflow & DagsHub
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: [contactrkp21@gmail.com]
⭐ Star this repo if you find it useful!
Made with ❤️ by RKP Code | Powered by Machine Learning