End-to-end customer segmentation system using Machine Learning (K-Means clustering) integrated with CodeIgniter 4 web framework.
- Machine Learning Pipeline: Complete ML pipeline with preprocessing, feature engineering, and K-Means clustering
- Model Persistence: Save and load trained models for predictions
- Web Dashboard: Interactive dashboard showing cluster statistics and distributions
- Real-time Predictions: Predict customer segments for new data
- Visualization: 2D PCA visualization of customer clusters
- Batch Processing: Upload CSV files for batch predictions
- RESTful API: JSON API endpoints for integration
MachineLearningwithCodeIgniter/
├── ml_pipeline/ # Machine Learning Pipeline
│ ├── customer_segmentation.py # Main training pipeline
│ ├── predict_api.py # Prediction API
│ ├── train_model.sh # Training script
│ └── requirements.txt # Python dependencies
├── models/ # Trained models & data
│ ├── segmentation_model.pkl # Trained model
│ ├── cluster_profiles.json # Cluster statistics
│ └── segmented_customers.csv # Segmented data
├── app/
│ ├── Controllers/
│ │ └── Segmentation.php # Main controller
│ ├── Models/
│ │ └── CustomerModel.php # Data model
│ ├── Views/
│ │ └── segmentation/ # Web interface
│ └── Config/
│ └── Routes.php # Routes configuration
├── public/ # Web root
└── bank.csv # Source data
- PHP 8.1 or higher
- CodeIgniter 4.x
- Composer
- Python 3.8+
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
composer installcd ml_pipeline
pip3 install -r requirements.txtcp .env.example .envEdit .env and configure your settings.
cd ml_pipeline
bash train_model.shThis will:
- Load the bank.csv data
- Preprocess features
- Train K-Means clustering model
- Generate cluster profiles
- Save model and segmented data
php spark serveVisit: http://localhost:8080
-
Dashboard (
/segmentation)- View cluster statistics
- See cluster distribution
- Train/retrain model
-
Predict (
/segmentation/predict)- Input customer data
- Get real-time segment prediction
- View cluster profile
-
Visualize (
/segmentation/visualize)- 2D scatter plot of clusters
- PCA visualization
- Interactive charts
-
Customers (
/segmentation/customers)- Browse segmented customers
- Export data
GET /segmentation/getClusterStatsResponse:
{
"success": true,
"profiles": {
"0": {
"cluster_id": 0,
"size": 2500,
"percentage": 25.0,
"features": {...}
}
},
"distribution": [...]
}POST /segmentation/predictSegment
Content-Type: application/json
{
"age": 35,
"job": "management",
"marital": "married",
"education": "tertiary",
"balance": 5000,
...
}Response:
{
"success": true,
"prediction": {
"cluster": 2,
"cluster_profile": {...},
"confidence": 0.85
}
}GET /segmentation/getVisualizationDataPOST /segmentation/trainModelcd ml_pipeline
# Predict single customer
python3 predict_api.py '{"age": 35, "job": "management", "balance": 5000, ...}'
# Batch prediction
python3 predict_api.py '[{...}, {...}, ...]'- Demographics: age, job, marital, education
- Financial: balance, housing, loan, default
- Campaign: contact, day, month, duration, campaign, pdays, previous, poutcome
- Target: deposit
- Label encoding for categorical variables
- Feature scaling using StandardScaler
- PCA for dimensionality reduction (visualization)
- Algorithm: K-Means
- Number of clusters: 4 (configurable)
- Distance metric: Euclidean
- Cluster assignments: 0, 1, 2, 3
- Cluster profiles: Statistical summaries for each segment
- PCA coordinates: For visualization
- Dataset: Bank Marketing Dataset (bank.csv)
- Records: 11,163 customers
- Features: 17 attributes
age: Customer agejob: Type of jobmarital: Marital statuseducation: Education leveldefault: Has credit in default?balance: Average yearly balance (euros)housing: Has housing loan?loan: Has personal loan?contact: Contact communication typeday: Last contact day of monthmonth: Last contact monthduration: Last contact duration (seconds)campaign: Number of contacts during campaignpdays: Days since last contactprevious: Number of contacts before campaignpoutcome: Outcome of previous campaigndeposit: Has term deposit?
Edit ml_pipeline/customer_segmentation.py:
pipeline = CustomerSegmentationPipeline(n_clusters=5) # Change from 4 to 5- Update feature selection in
preprocess_data()method - Retrain model
- Update prediction form in views
Replace K-Means with other algorithms:
from sklearn.cluster import DBSCAN, AgglomerativeClustering
# Instead of KMeans
self.model = DBSCAN(eps=0.5, min_samples=5)- Set
CI_ENVIRONMENT = productionin.env - Configure production database
- Set proper file permissions
- Configure web server (Apache/Nginx)
- Install Python dependencies on server
- Train model with production data
- Enable HTTPS
- Configure CORS if needed
- Set up monitoring and logging
Apache (.htaccess):
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [L]Nginx:
location / {
try_files $uri $uri/ /index.php?$query_string;
}- Train the model first:
bash ml_pipeline/train_model.sh - Check
models/directory exists and has permissions
- Check Python installation:
python3 --version - Update path in controller:
$this->pythonPath = 'python3'
chmod +x ml_pipeline/train_model.sh
chmod -R 755 models/- Ensure all required fields are provided
- Check input data format matches training data
- Verify model is loaded correctly
- Training time: ~30 seconds (11K records)
- Prediction time: <100ms per customer
- Model size: ~500KB
- Memory usage: ~50MB
- Add more clustering algorithms
- Implement cluster comparison
- Add customer lifetime value prediction
- Create automated retraining schedule
- Add A/B testing framework
- Implement real-time streaming predictions
- Add model versioning
- Create admin panel for model management
MIT License
Developed using:
- CodeIgniter 4
- Scikit-learn
- Bootstrap 5
- Chart.js
Note: This is a demonstration project for educational purposes. For production use, add proper authentication, authorization, input validation, and error handling.