Transform data into decisions through intelligent forecasting and customer segmentation
DFSE is a production-ready analytics engine that combines time-series forecasting with customer segmentation to drive business decisions. Built with real-world applications in mind, it demonstrates end-to-end data science from raw data to actionable insights.
- 🚀 Decision-First: Built to answer real business questions, not just create models
- 🔧 Production-Ready: Clean code, automated workflows, reproducible results
- 📈 Business-Focused: Demand forecasting + RFM segmentation = immediate value
- 🎓 Educational: Clear structure, well-documented, perfect for learning
|
|
Step 1: Download the project
git clone https://github.com/neilsable/dfse.git
cd dfseStep 2: Run it!
make runDone! ✅ Your reports and data will appear in the reports/ and data/processed/ folders.
1. 🔧 Sets up Python environment automatically
2. 📦 Installs all needed libraries
3. 🎲 Creates sample data (no downloads needed!)
4. 🤖 Builds forecasts and customer segments
5. 📊 Generates reports and charts
6. ✅ Saves everything to your folders
Time needed: ~2-3 minutes
No problem! Use this instead:
# Step 1: Download project (same as above)
git clone https://github.com/neilsable/dfse.git
cd dfse
# Step 2: Run this simple script
./run.shOr run it manually:
python3 -m venv .venv
source .venv/bin/activate # macOS/Linux
.\.venv\Scripts\Activate.ps1 # Windows PowerShell
pip install -r requirements.txt
python3 -m src.pipelineYou only need:
- ✅ Python 3.8 or newer (Download here)
- ✅ Git (Download here)
- ✅ 5 minutes of your time
That's it! No databases, no API keys, no complicated setup.
After running the project, here's where everything is saved:
reports/
├── 📄 executive_summary.md ← Read this first! Plain English summary
└── 📈 forecast_plot.png ← Visual chart showing predictions
What to do with these:
- Open
executive_summary.mdin any text editor - View
forecast_plot.pngto see your forecast chart
data/processed/
├── 📊 forecast_metrics.csv ← How accurate is the model?
├── 📈 forecast_60d.csv ← Next 60 days of predictions
├── 👥 rfm_segments.csv ← Each customer's segment
└── 📋 segment_summary.csv ← Summary of customer groups
What to do with these:
- Open any
.csvfile in Excel, Google Sheets, or Python - Use them for presentations, dashboards, or further analysis
In executive_summary.md:
📊 Forecast Accuracy: 94.2%
👥 Customer Segments Found: 4 groups
💰 High-Value Customers: 127 people
📈 Recommended Action: Focus on "Champions" segment
In forecast_plot.png:
Problem: make: command not found
Solution: Use ./run.sh instead, or follow the manual steps above
Problem: python3: command not found
Solution: Try python instead of python3, or install Python from python.org
Problem: Permission denied when running ./run.sh
Solution: Run chmod +x run.sh first, then try again
Problem: Libraries won't install
Solution: Make sure you activated the virtual environment (the source .venv/bin/activate step)
Still stuck? Open an issue on GitHub and I'll help!
dfse/
│
├── src/ # Source code
│ ├── pipeline.py # Main forecasting pipeline
│ ├── evaluation.py # Model evaluation
│ └── utils/ # Helper functions
│
├── data/
│ ├── raw/ # Generated sample data
│ └── processed/ # Analysis outputs
│
├── reports/ # Generated reports
├── assets/ # Images and resources
│
├── requirements.txt # Python dependencies
├── Makefile # Automation commands
└── run.sh # Simple run script
DFSE is perfect for:
- 📚 Portfolio Projects: Showcase end-to-end data science skills
- 🏢 Business Analytics: Demand planning and customer insights
- 🎯 Learning: Understand forecasting and segmentation in practice
- 🔧 Template: Starting point for real-world analytics projects
| Category | Technologies |
|---|---|
| Language | Python 3.8+ |
| Data Processing | pandas, NumPy |
| Modeling | statsmodels, scikit-learn |
| Visualization | matplotlib, seaborn |
| Automation | Make, bash scripting |
Most data science projects are either:
- Too theoretical (just Jupyter notebooks with no real workflow)
- Too complex (enterprise-level code that's hard to understand)
I wanted something in between — a project that shows real production skills but stays simple enough for anyone to learn from.
# I chose ARIMA/Exponential Smoothing over LSTM/Prophet because:
✅ More interpretable (you can explain WHY it predicts what it does)
✅ Works well with limited data
✅ Faster to train and run
✅ Industry standard for demand forecastingFor business decisions, explainability > accuracy by 2%. Stakeholders need to trust your model.
# RFM = Recency, Frequency, Monetary Value
✅ Simple enough to explain to non-technical people
✅ Actionable (you can target segments immediately)
✅ Proven technique used by real companies
✅ No complex clustering algorithms to debugI wanted to show I understand business value, not just fancy algorithms.
src/
├── pipeline.py # Main logic (what happens)
├── evaluation.py # Quality checks (is it good?)
└── utils/ # Helpers (how we do it)
Why this structure?
- ✅ Separates WHAT from HOW
- ✅ Easy to test individual pieces
- ✅ Can swap out methods without breaking everything
- ✅ Follows production best practices
Instead of writing "Step 1: Do this, Step 2: Do that..." I built:
make run # Just works™Why?
- Users don't read long instructions
- Automation forces you to think about reproducibility
- Shows DevOps thinking, not just data science
| Challenge | What I Did | Why It Matters |
|---|---|---|
| Data Generation | Built synthetic data generator instead of using real data | Shows I can create realistic test scenarios |
| Error Handling | Added validation at each pipeline step | Production code needs to fail gracefully |
| Output Design | Created both technical (CSV) and business (MD) outputs | Data scientists serve multiple audiences |
| Environment Setup | Made it work on Mac, Linux, AND Windows | Real tools need to work everywhere |
Here's what makes this code different:
# Instead of messy notebooks, I built a clear pipeline:
raw_data → validation → transformation → modeling → evaluation → reporting# Each function does ONE thing well
def calculate_rfm_score(df):
"""Takes customer data, returns RFM segments"""
# Not: calculate_rfm_and_make_plots_and_send_email()def forecast_demand(data: pd.DataFrame, periods: int = 60) -> Dict[str, Any]:
"""
Generate demand forecast with confidence intervals.
Args:
data: Historical demand data with datetime index
periods: Number of future periods to forecast
Returns:
Dictionary containing forecast, metrics, and plots
"""This isn't just "code that works" — it's code others can maintain.
As someone looking to break into data science/analytics roles, I wanted to show:
- I understand business needs (decision-first approach)
- I write production code (not just experiments)
- I communicate clearly (reports for stakeholders, code for developers)
- I think about the full lifecycle (from data to decisions)
This project represents how I actually work, not just what I know.
Contributions, issues, and feature requests are welcome!
- Fork the project
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Neil Sable
- GitHub: @neilsable
- LinkedIn: Neil Sable
- Email: neilsable7@gmail.com
Give a ⭐️ if this project helped you!
Built with ❤️ for practical data science
