# Phase 6: WBF Ensemble & Post-Processing Tuning

The previous submission using a 5-fold YOLOv5s ensemble with NMS did not medal (offline CV `0.3156`). With limited time remaining, a full retraining of a larger model is not feasible. This notebook focuses on improving the score by optimizing the post-processing and ensembling of the *existing* YOLOv5s models.

## Plan

### 1. Advanced Ensemble Technique
*   **Weighted Boxes Fusion (WBF):** Replace the simple Non-Maximum Suppression (NMS) with WBF. WBF can produce more accurate boxes by averaging the coordinates of overlapping predictions instead of just discarding them. This will be applied to the existing `oof_detector_yolov5.csv` predictions.

### 2. Systematic Post-Processing Tuning
*   **Grid Search:** Perform a grid search on the OOF predictions to find the optimal combination of hyperparameters for post-processing:
    *   `confidence_threshold`: The minimum confidence for a box to be considered.
    *   `wbf_iou_threshold`: The IoU threshold for WBF to merge boxes.
    *   `negative_filter_threshold`: The classifier confidence threshold for removing boxes from 'Negative' studies.

## Workflow
1.  **Load OOF Data:** Load `oof_detector_yolov5.csv` and `oof_classifier.csv`.
2.  **Install WBF:** Add a cell to `pip install ensemble-boxes`.
3.  **Implement Tuning Loop:** Create a CV calculation cell that:
    *   Loops through different hyperparameter values (`conf_threshold`, `wbf_iou_threshold`, `negative_filter_threshold`).
    *   For each combination:
        *   Filters detector predictions by `conf_threshold`.
        *   Applies classifier filtering using `negative_filter_threshold`.
        *   Applies WBF using `wbf_iou_threshold`.
        *   Calculates the blended mAP score.
4.  **Identify Best Parameters:** Select the parameter set that yields the highest offline blended CV score.
5.  **Generate New Submission:** Create a new submission notebook that uses these optimal parameters.

In [3]:
!pip install --quiet ensemble-boxes

🔧 Intercepting pip install command: pip install --quiet ensemble-boxes


✅ Package installation completed and import cache refreshed.





0

In [4]:
# --- Part 2: Hyperparameter Tuning (Confidence & Classifier Filter) ---

import pandas as pd
import numpy as np
from sklearn.metrics import average_precision_score
import ast
from tqdm import tqdm
import itertools

# --- Helper Functions for mAP Calculation (from notebook 05) ---
def calculate_iou(box1, box2):
    x1_inter = max(box1[0], box2[0])
    y1_inter = max(box1[1], box2[1])
    x2_inter = min(box1[2], box2[2])
    y2_inter = min(box1[3], box2[3])
    inter_area = max(0, x2_inter - x1_inter) * max(0, y2_inter - y1_inter)
    box1_area = (box1[2] - box1[0]) * (box1[3] - box1[1])
    box2_area = (box2[2] - box2[0]) * (box2[3] - box2[1])
    union_area = box1_area + box2_area - inter_area
    if union_area == 0: return 0.0
    return inter_area / union_area

def calculate_ap_per_class(gt_boxes, pred_boxes, iou_threshold=0.5):
    if not pred_boxes:
        return 0.0 if gt_boxes else 1.0
    pred_boxes = sorted(pred_boxes, key=lambda x: x[4], reverse=True)
    true_positives = np.zeros(len(pred_boxes))
    false_positives = np.zeros(len(pred_boxes))
    num_gt_boxes = len(gt_boxes)
    gt_matched = [False] * num_gt_boxes
    for i, pred_box in enumerate(pred_boxes):
        best_iou = 0
        best_gt_idx = -1
        for j, gt_box in enumerate(gt_boxes):
            iou = calculate_iou(pred_box[:4], gt_box)
            if iou > best_iou:
                best_iou = iou
                best_gt_idx = j
        if best_iou >= iou_threshold and best_gt_idx != -1 and not gt_matched[best_gt_idx]:
            true_positives[i] = 1
            gt_matched[best_gt_idx] = True
        else:
            false_positives[i] = 1
    cum_tp = np.cumsum(true_positives)
    cum_fp = np.cumsum(false_positives)
    recalls = cum_tp / num_gt_boxes if num_gt_boxes > 0 else np.zeros_like(cum_tp)
    precisions = cum_tp / (cum_tp + cum_fp)
    precisions = np.concatenate(([0.], precisions, [0.]))
    recalls = np.concatenate(([0.], recalls, [1.]))
    for i in range(len(precisions) - 2, -1, -1):
        precisions[i] = max(precisions[i], precisions[i + 1])
    ap = 0.0
    for i in range(len(recalls) - 1):
        if recalls[i+1] != recalls[i]:
            ap += (recalls[i+1] - recalls[i]) * precisions[i+1]
    return ap

# --- 1. Load Data ---
print("Loading OOF and ground truth data...")
df_gt = pd.read_csv('df_train_folds.csv')
df_gt['image_id'] = df_gt['id'].str.replace('_image', '')
df_clf_oof = pd.read_csv('oof_classifier.csv')
df_det_oof = pd.read_csv('oof_detector_yolov5.csv')

# Pre-process GT data for faster lookup
image_gt = df_gt[['id', 'boxes', 'image_id']].rename(columns={'id': 'image_id_with_suffix'})
gt_boxes_map = {}
for _, row in tqdm(image_gt.iterrows(), total=len(image_gt), desc="Processing GT boxes"):
    gt_boxes = []
    if isinstance(row['boxes'], str) and row['boxes'].startswith('['):
        try:
            boxes_list = ast.literal_eval(row['boxes'])
            for box in boxes_list:
                gt_boxes.append([box['x'], box['y'], box['x'] + box['width'], box['y'] + box['height']])
        except (ValueError, SyntaxError): pass
    gt_boxes_map[row['image_id']] = gt_boxes

# Pre-process classifier OOF and merge with detector OOF
classes = ['Negative for Pneumonia', 'Typical Appearance', 'Indeterminate Appearance', 'Atypical Appearance']
df_clf_oof_img = df_clf_oof.rename(columns={c: f'pred_{c}' for c in classes})
df_det_oof_with_clf = df_det_oof.merge(df_clf_oof_img[['image_id', 'pred_Negative for Pneumonia']], on='image_id', how='left')

# --- 2. Define Hyperparameter Grid ---
conf_thresholds = [0.01, 0.05, 0.1, 0.15, 0.2, 0.25]
neg_filter_thresholds = [0.4, 0.5, 0.6, 0.7, 0.8, 1.0] # 1.0 means no filtering

# --- 3. Run Tuning Loop ---
results = []
study_map = 0.3537 # From previous notebook run, this is constant
print(f"Using pre-calculated Study mAP: {study_map:.4f}")

param_grid = list(itertools.product(conf_thresholds, neg_filter_thresholds))

for conf_th, neg_th in tqdm(param_grid, desc="Tuning Hyperparameters"):
    # Filter by confidence
    df_det_conf_filtered = df_det_oof_with_clf[df_det_oof_with_clf['confidence'] > conf_th]
    
    # Filter by classifier prediction
    df_det_postprocessed = df_det_conf_filtered[df_det_conf_filtered['pred_Negative for Pneumonia'] < neg_th]
    
    # Group predictions by image_id for faster lookup
    preds_by_image = df_det_postprocessed.groupby('image_id')[['x_min', 'y_min', 'x_max', 'y_max', 'confidence']].apply(lambda x: x.values.tolist()).to_dict()

    # Calculate image mAP
    ap_scores = []
    for image_id in image_gt['image_id'].unique():
        gt_boxes = gt_boxes_map.get(image_id, [])
        pred_boxes = preds_by_image.get(image_id, [])
        ap = calculate_ap_per_class(gt_boxes, pred_boxes, iou_threshold=0.5)
        ap_scores.append(ap)
    
    image_map = np.mean(ap_scores)
    blended_map = (study_map + image_map) / 2
    
    results.append({
        'conf_th': conf_th,
        'neg_th': neg_th,
        'image_map': image_map,
        'blended_map': blended_map
    })

# --- 4. Analyze Results ---
df_results = pd.DataFrame(results)
best_result = df_results.loc[df_results['blended_map'].idxmax()]

print("\n--- Tuning Results ---")
print(df_results.sort_values('blended_map', ascending=False).to_string())

print("\n--- Best Parameters ---")
print(best_result)

Loading OOF and ground truth data...


Processing GT boxes:   0%|          | 0/5696 [00:00<?, ?it/s]

Processing GT boxes:  37%|███▋      | 2116/5696 [00:00<00:00, 21148.62it/s]

Processing GT boxes:  75%|███████▍  | 4269/5696 [00:00<00:00, 21370.80it/s]

Processing GT boxes: 100%|██████████| 5696/5696 [00:00<00:00, 21299.41it/s]




Using pre-calculated Study mAP: 0.3537


Tuning Hyperparameters:   0%|          | 0/36 [00:00<?, ?it/s]

Tuning Hyperparameters:   3%|▎         | 1/36 [00:00<00:15,  2.23it/s]

Tuning Hyperparameters:   6%|▌         | 2/36 [00:00<00:15,  2.25it/s]

Tuning Hyperparameters:   8%|▊         | 3/36 [00:01<00:15,  2.09it/s]

Tuning Hyperparameters:  11%|█         | 4/36 [00:01<00:15,  2.11it/s]

Tuning Hyperparameters:  14%|█▍        | 5/36 [00:02<00:14,  2.11it/s]

Tuning Hyperparameters:  17%|█▋        | 6/36 [00:02<00:14,  2.02it/s]

Tuning Hyperparameters:  25%|██▌       | 9/36 [00:03<00:06,  4.37it/s]

Tuning Hyperparameters:  33%|███▎      | 12/36 [00:03<00:03,  6.95it/s]

Tuning Hyperparameters:  81%|████████  | 29/36 [00:03<00:00, 28.35it/s]

Tuning Hyperparameters: 100%|██████████| 36/36 [00:03<00:00, 10.97it/s]


--- Tuning Results ---
    conf_th  neg_th  image_map  blended_map
15     0.10     0.7   0.322200     0.337950
14     0.10     0.6   0.322112     0.337906
13     0.10     0.5   0.322112     0.337906
12     0.10     0.4   0.322112     0.337906
17     0.10     1.0   0.322024     0.337862
16     0.10     0.8   0.322024     0.337862
18     0.15     0.4   0.320503     0.337101
23     0.15     1.0   0.320503     0.337101
21     0.15     0.7   0.320503     0.337101
20     0.15     0.6   0.320503     0.337101
19     0.15     0.5   0.320503     0.337101
22     0.15     0.8   0.320503     0.337101
26     0.20     0.6   0.319698     0.336699
27     0.20     0.7   0.319698     0.336699
28     0.20     0.8   0.319698     0.336699
29     0.20     1.0   0.319698     0.336699
24     0.20     0.4   0.319698     0.336699
25     0.20     0.5   0.319698     0.336699
30     0.25     0.4   0.318996     0.336348
31     0.25     0.5   0.318996     0.336348
32     0.25     0.6   0.318996     0.336348
33     0


