# CLASS BALANCING

In this notebook, we address the class imbalance present in our stockout dataset. 

Since stockout events are significantly less frequent than non-stockouts, a model trained without balancing may ignore the minority class and fail to detect critical stockout risks.

The goal is to determine:

- if balancing improves the predictive capability of the datase
- which balancing approach yields the best trade-off between detecting stockouts and minimizing false alarms, leading to a more reliable inventory risk model.

We will:

- train a baseline model with **no balancing**

- apply balancing techniques:
    - **Random Undersampling**
    - **Random Oversampling**
    - **SMOTE-Tomek**

- compare performance using metrics suited for imbalanced classification: 
    - **Recall (priority in this risk scoring cases)**
    - **Precision**
    - **F1-score**
    - **ROC-AUC**
    - **Precision-Recall curves**
    
We will use Logistic Regression because we are not trying to get the final modeling performance yet. We are comparing balancing techniques, not algorithms and LR is:
- Simple
- Fast
- Sensitive to class imbalance
- Easy to compare across sampling methods

## IMPORT LIBRARIES

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from imblearn.under_sampling import RandomUnderSampler
from imblearn.over_sampling import RandomOverSampler
from imblearn.under_sampling import TomekLinks
from imblearn.over_sampling import SMOTE
from imblearn.combine import SMOTETomek

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

from sklearn.metrics import (
    classification_report,
    precision_score,
    recall_score,
    f1_score,
    confusion_matrix,
    roc_auc_score,
    roc_curve,
    precision_recall_curve,
    auc
)

# Autocomplete
%config IPCompleter.greedy=True

# Ignore warnings for cleaner output
import warnings
warnings.filterwarnings("ignore")

## IMPORT DATA

In [2]:
project_path = '/Users/rober/retail-stockout-risk-scoring/'

name_X = 'X_preselected.pickle'
name_y = 'y_preselected.pickle'

X = pd.read_pickle(project_path + '/02_Data/03_Working/' + name_X)
y = pd.read_pickle(project_path + '/02_Data/03_Working/' + name_y)

## No balancing

### Instantiate, train, apply and predict

In [3]:
# stratify=y to preserve imbalance ratio
# random_state=42 for reproducibility
# max_iter=500, solver='lbfgs' to fix warning

train_X,test_X,train_y,test_y = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
rl_no_balancing = LogisticRegression(n_jobs = -1, max_iter=500, solver='lbfgs')
rl_no_balancing.fit(train_X,train_y)
pred_rl_no_balancing = rl_no_balancing.predict_proba(test_X)[:,1]

### Evaluate

In [4]:
from sklearn.metrics import (
    classification_report,
    precision_score,
    recall_score,
    f1_score,
    confusion_matrix,
    roc_auc_score,
    precision_recall_curve,
    auc
)

# Class predictions using default 0.5 threshold, only to compute classification metrics
pred_class = (pred_rl_no_balancing >= 0.5).astype(int)

# Basic metrics
print("Confusion Matrix:\n", confusion_matrix(test_y, pred_class), "\n")
print("Classification Report:\n", classification_report(test_y, pred_class))

print("Precision:", precision_score(test_y, pred_class))
print("Recall:", recall_score(test_y, pred_class))
print("F1-Score:", f1_score(test_y, pred_class))

# ROC-AUC
roc_auc = roc_auc_score(test_y, pred_rl_no_balancing)
print("ROC-AUC:", roc_auc)

# PR-AUC (Precision-Recall Curve AUC)
prec, rec, _ = precision_recall_curve(test_y, pred_rl_no_balancing)
pr_auc = auc(rec, prec)
print("PR-AUC:", pr_auc)

Confusion Matrix:
 [[  74  267]
 [   2 5657]] 

Classification Report:
               precision    recall  f1-score   support

           0       0.97      0.22      0.35       341
           1       0.95      1.00      0.98      5659

    accuracy                           0.96      6000
   macro avg       0.96      0.61      0.67      6000
weighted avg       0.96      0.96      0.94      6000

Precision: 0.9549291019581364
Recall: 0.9996465806679625
F1-Score: 0.9767763101096434
ROC-AUC: 0.9824264569090111
PR-AUC: 0.9989356095595938


Key metrics

| Metric                   | Result   | Interpretation                                            |
| ------------------------ | -------- | --------------------------------------------------------- |
| **Precision (stockout)** | 0.95     | When model says ‚Äústockout‚Äù, it‚Äôs almost always right ‚úÖ |
| **Recall (stockout)**    | **0.21** | It detects only **21%** of stockouts ‚ùå   |
| F1-Score (stockout)      | 0.35     | Poor overall detection                                    |
| Accuracy                 | 0.95     | Misleading (majority class dominates) ‚ùå                   |
| ROC-AUC                  | 0.98     | Looks amazing ‚ö†Ô∏è but hides the Recall problem             |
| PR-AUC                   | 0.99     | High because positives are predicted rarely but correctly |


**‚ùå The model almost never predicts stockouts ( ‚úÖ but when it does, it‚Äôs usually correct)**

**We need balancing** to force model to care a about stockouts, improve Recall, even if Precision drops a little.

## Undersampling

### Instantiate, resample train, apply and predict

In [5]:
rus = RandomUnderSampler(sampling_strategy= 1)
X_rus, y_rus = rus.fit_resample(X,y)
train_X_rus,test_X_rus,train_y_rus,test_y_rus = train_test_split(X_rus,y_rus,test_size=0.3,random_state=42, stratify=y_rus)
rl_rus = LogisticRegression(n_jobs = -1)
rl_rus.fit(train_X_rus,train_y_rus)
pred_rl_rus = rl_rus.predict_proba(test_X_rus)[:,1]

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


### Evaluate

In [6]:
# Class predictions using default 0.5 threshold
pred_class_rus = (pred_rl_rus >= 0.5).astype(int)

print("Confusion Matrix:")
print(confusion_matrix(test_y_rus, pred_class_rus))
print("\nClassification Report:")
print(classification_report(test_y_rus, pred_class_rus))

print("Precision:", precision_score(test_y_rus, pred_class_rus))
print("Recall:", recall_score(test_y_rus, pred_class_rus))
print("F1-Score:", f1_score(test_y_rus, pred_class_rus))

# ROC-AUC
roc_auc_rus = roc_auc_score(test_y_rus, pred_rl_rus)
print("ROC-AUC:", roc_auc_rus)

# PR-AUC
prec, rec, _ = precision_recall_curve(test_y_rus, pred_rl_rus)
pr_auc_rus = auc(rec, prec)
print("PR-AUC:", pr_auc_rus)

Confusion Matrix:
[[341   0]
 [ 65 276]]

Classification Report:
              precision    recall  f1-score   support

           0       0.84      1.00      0.91       341
           1       1.00      0.81      0.89       341

    accuracy                           0.90       682
   macro avg       0.92      0.90      0.90       682
weighted avg       0.92      0.90      0.90       682

Precision: 1.0
Recall: 0.8093841642228738
F1-Score: 0.8946515397082657
ROC-AUC: 0.9793431429038278
PR-AUC: 0.9840617279041673



| Metric                | Baseline (No balancing) | Undersampling | Interpretation                                 |
| --------------------- | ----------------------- | ------------- | ---------------------------------------------- |
| **Recall (stockout)** | **0.21**                | **0.84**      | ‚úÖ Huge improvement in detecting stockouts |
| Precision             | 0.95                    | 1.00          | ‚úÖ ? Perfect (rare)                   |
| F1-score              | 0.35                    | 0.92          | ‚úÖ Huge improvement                               |
| ROC-AUC               | 0.98                    | 0.99          | ‚úÖ Still high                                  |
| PR-AUC                | 0.99                    | 0.99          | ‚úÖ Excellent                                      |


## Oversampling

### Instantiate, resample train, apply and predict

In [7]:
ros = RandomOverSampler(sampling_strategy=1, random_state=42)
X_ros, y_ros = ros.fit_resample(X, y)

train_X_ros, test_X_ros, train_y_ros, test_y_ros = train_test_split(
    X_ros, y_ros,
    test_size=0.3,
    random_state=42,
    stratify=y_ros
)

rl_ros = LogisticRegression(n_jobs=-1, max_iter=500, solver='lbfgs')
rl_ros.fit(train_X_ros, train_y_ros)

pred_rl_ros = rl_ros.predict_proba(test_X_ros)[:, 1]

### Evaluate

In [8]:
# Class predictions using default 0.5 threshold
pred_class_ros = (pred_rl_ros >= 0.5).astype(int)

print("Confusion Matrix:")
cm_ros = confusion_matrix(test_y_ros, pred_class_ros)
print(cm_ros, "\n")

print("Classification Report:")
print(classification_report(test_y_ros, pred_class_ros))

print("Precision:", precision_score(test_y_ros, pred_class_ros))
print("Recall:", recall_score(test_y_ros, pred_class_ros))
print("F1-Score:", f1_score(test_y_ros, pred_class_ros))

# ROC-AUC
roc_auc_ros = roc_auc_score(test_y_ros, pred_rl_ros)
print("ROC-AUC:", roc_auc_ros)

# PR-AUC
prec_ros, rec_ros, _ = precision_recall_curve(test_y_ros, pred_rl_ros)
pr_auc_ros = auc(rec_ros, prec_ros)
print("PR-AUC:", pr_auc_ros)

Confusion Matrix:
[[5624   36]
 [ 560 5099]] 

Classification Report:
              precision    recall  f1-score   support

           0       0.91      0.99      0.95      5660
           1       0.99      0.90      0.94      5659

    accuracy                           0.95     11319
   macro avg       0.95      0.95      0.95     11319
weighted avg       0.95      0.95      0.95     11319

Precision: 0.9929892891918208
Recall: 0.9010425870295106
F1-Score: 0.9447841393366685
ROC-AUC: 0.9843482379298869
PR-AUC: 0.9875473671178135


| Metric               | Baseline | Undersampling | Oversampling | SMOTE-Tomek | Best?                                         |
| -------------------- | -------- | ------------- | ------------ | ----------- | --------------------------------------------- |
| Recall (stockouts) | 0.21     | 0.84          | **0.99**     | ‚Ä¶           | Recall is priority ‚Üí **Oversampling leading** |
| Precision            | 0.95     | 1.00          | 0.99         | ‚Ä¶           | All good                                      |
| F1-score             | 0.35     | 0.92          | 0.94         | ‚Ä¶           | Both Oversampling and Undersampling are good                                              |
| ROC-AUC              | 0.98     | 0.99          | 0.98         | ‚Ä¶           | Similar                                       |
| PR-AUC               | 0.99     | 0.99          | 0.99         | ‚Ä¶           | Similar                                       |


## SMOTE-Tomek

### Instantiate, resample train, apply and predict

In [9]:
smt = SMOTETomek(random_state=42)

X_smt, y_smt = smt.fit_resample(X, y)

train_X_smt, test_X_smt, train_y_smt, test_y_smt = train_test_split(
    X_smt, y_smt,
    test_size=0.3,
    random_state=42,
    stratify=y_smt
)

rl_smt = LogisticRegression(n_jobs=-1, max_iter=500, solver='lbfgs')
rl_smt.fit(train_X_smt, train_y_smt)

pred_rl_smt = rl_smt.predict_proba(test_X_smt)[:, 1]

### Evaluate

In [10]:
pred_class_smt = (pred_rl_smt >= 0.5).astype(int)

print("Confusion Matrix:")
cm_smt = confusion_matrix(test_y_smt, pred_class_smt)
print(cm_smt, "\n")

print("Classification Report:")
print(classification_report(test_y_smt, pred_class_smt))

print("Precision:", precision_score(test_y_smt, pred_class_smt))
print("Recall:", recall_score(test_y_smt, pred_class_smt))
print("F1-Score:", f1_score(test_y_smt, pred_class_smt))

# ROC-AUC
roc_auc_smt = roc_auc_score(test_y_smt, pred_rl_smt)
print("ROC-AUC:", roc_auc_smt)

# PR-AUC
prec_smt, rec_smt, _ = precision_recall_curve(test_y_smt, pred_rl_smt)
pr_auc_smt = auc(rec_smt, prec_smt)
print("PR-AUC:", pr_auc_smt)


Confusion Matrix:
[[5638   20]
 [ 506 5151]] 

Classification Report:
              precision    recall  f1-score   support

           0       0.92      1.00      0.96      5658
           1       1.00      0.91      0.95      5657

    accuracy                           0.95     11315
   macro avg       0.96      0.95      0.95     11315
weighted avg       0.96      0.95      0.95     11315

Precision: 0.9961322761554825
Recall: 0.9105532968004243
F1-Score: 0.9514222386405616
ROC-AUC: 0.9869953128826275
PR-AUC: 0.98991466402955


| Method          | Recall (stockouts) | Precision | Data preserved               | Final Score        |
| --------------- | ------------------ | --------- | ---------------------------- | ------------------ |
| Baseline        | ‚ùå 21%              | ‚úî 95%     | ‚úî 100%                       | ‚ùå unusable         |
| Undersampling   | ‚≠ê 100%             | ‚ö† 84%     | ‚ùå loses data                 | ‚ö† high alert noise |
| Oversampling    | ‚≠ê 99%              | ‚úî 91%     | ‚úî keeps data                 | ‚úî very good        |
| **SMOTE-Tomek** | ‚≠ê 99.6%            | ‚úî 90.9%   | ‚úî keeps data + removes noise | **üèÜ Best choice** |


## SAVE DATASET AFTER CLASS BALANCING

In [11]:
X_balanced = X_smt
y_balanced = y_smt

name_X_balanced = project_path + '/02_Data/03_Working/' + 'X_balanced.pickle'
name_y_balanced = project_path + '/02_Data/03_Working/' + 'y_balanced.pickle'

X_balanced.to_pickle(name_X_balanced)
y_balanced.to_pickle(name_y_balanced)