# Vendor Payment Delay Prediction - Model Training

## Overview
This notebook implements and compares multiple machine learning models for predicting vendor payment delays:

### Models Implemented:
1. **Logistic Regression** - Baseline interpretable model
2. **Random Forest** - Handles non-linear relationships and feature interactions  
3. **XGBoost** - High-performance gradient boosting
4. **LightGBM** - Fast gradient boosting with optimal performance

### Evaluation Metrics:
- Accuracy, Precision, Recall, F1-Score
- ROC-AUC for ranking quality
- Business impact analysis
- Feature importance analysis

In [None]:
# Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sys
import os
sys.path.append('../src')

# Machine Learning Libraries
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (
    classification_report, confusion_matrix, roc_auc_score, 
    accuracy_score, precision_score, recall_score, f1_score,
    roc_curve, precision_recall_curve
)
from sklearn.preprocessing import StandardScaler

# Handle class imbalance
from imblearn.over_sampling import SMOTE

# Gradient Boosting (install with: pip install xgboost lightgbm)
try:
    import xgboost as xgb
    print("✅ XGBoost imported successfully")
except ImportError:
    print("❌ XGBoost not installed. Run: pip install xgboost")

try:
    import lightgbm as lgb  
    print("✅ LightGBM imported successfully")
except ImportError:
    print("❌ LightGBM not installed. Run: pip install lightgbm")

# Our custom modules
from data_generator import VendorPaymentDataGenerator
from preprocessing import DataPreprocessor

import warnings
warnings.filterwarnings('ignore')

print("All libraries imported successfully!")