# Phase 8: Final Stable Submission with Tuned Parameters

## Objective
Generate a final submission using the best performing models and hyperparameters in a memory-stable manner. The previous attempt (`08_final_submission_tuned.ipynb`) failed due to CUDA OOM errors and kernel hangs, likely caused by loading all 10 models (5x classifier, 5x detector) into memory simultaneously.

## Strategy: Sequential Inference with Memory Management
This notebook breaks the inference process into discrete, sequential steps. After each major step, models are deleted and CUDA memory is cleared to prevent memory fragmentation and crashes.

### Workflow
1.  **Setup & Configuration:**
    *   Load all necessary libraries.
    *   Define constants, including the optimal hyperparameters found in `07_yolov5m_wbf_tuning.ipynb`:
        *   `CONF_THRESHOLD = 0.10`
        *   `NEGATIVE_FILTER_THRESHOLD = 0.70`
    *   Prepare the test dataframes.

2.  **Part 1: Classifier Inference:**
    *   Load the 5 `EfficientNet-B5` classifier models one by one.
    *   Run inference on the test set for each model.
    *   Average the predictions across the 5 folds.
    *   Save the final ensembled classifier predictions to `test_preds_classifier.csv`.
    *   **Crucially: Delete all classifier models and clear CUDA cache (`torch.cuda.empty_cache()`).**

3.  **Part 2: Detector Inference:**
    *   Load the 5 `YOLOv5s` detector models one by one.
    *   Run inference on the test set for each model.
    *   Aggregate all raw box predictions.
    *   Save the raw detector predictions to `test_preds_detector.csv`.
    *   **Crucially: Delete all detector models and clear CUDA cache.**

4.  **Part 3: Post-Processing & Submission File Generation:**
    *   Load the intermediate prediction files (`test_preds_classifier.csv`, `test_preds_detector.csv`).
    *   Apply the post-processing pipeline:
        *   Filter boxes by the tuned confidence threshold (`0.10`).
        *   Filter boxes based on the classifier's 'Negative' prediction using the tuned threshold (`0.70`).
    *   Format the results into the required `id,PredictionString` format.
    *   Generate the final `submission.csv`.

In [1]:
# --- Part 1: Setup & Configuration ---

import os
import sys
import pandas as pd
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
import cv2
import albumentations as A
from albumentations.pytorch import ToTensorV2
from tqdm import tqdm
import gc
import glob

# Add timm to path if not installed in the standard location
sys.path.append('/app/.pip-target/lib/python3.11/site-packages')
import timm

# --- Configuration ---
DATA_DIR = '.'
TEST_IMAGE_DIR = 'test_png_3ch' # Using 3-channel images for classifier
CLASSIFIER_MODEL_DIR = '.'
DETECTOR_MODEL_DIR = 'yolov5_runs/train_cv'

IMG_SIZE = 512 # For classifier
BATCH_SIZE = 1 # LAST ATTEMPT: Set to 1 to minimize memory footprint
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

# Tuned hyperparameters from notebook 07
TUNED_CONF_THRESHOLD = 0.10
TUNED_NEG_FILTER_THRESHOLD = 0.70

CLASSES = ['Negative for Pneumonia', 'Typical Appearance', 'Indeterminate Appearance', 'Atypical Appearance']
NUM_CLASSES = len(CLASSES)
NUM_FOLDS = 5

print(f"Device: {DEVICE}")
print(f"Tuned Confidence Threshold: {TUNED_CONF_THRESHOLD}")
print(f"Tuned Negative Filter Threshold: {TUNED_NEG_FILTER_THRESHOLD}")

# --- Prepare Test DataFrames ---
df_sub = pd.read_csv(os.path.join(DATA_DIR, 'sample_submission.csv'))

# Create image-level test dataframe
df_sub['StudyInstanceUID'] = df_sub['id'].apply(lambda x: x.split('_')[0])
df_sub['image_id'] = df_sub['id'].apply(lambda x: x.split('_')[0])
df_test_img = df_sub[df_sub['id'].str.contains('_image')].copy()
df_test_img['image_path'] = df_test_img['id'].apply(lambda x: os.path.join(TEST_IMAGE_DIR, x.replace('_image', '.png')))

# Create study-level test dataframe
df_test_study = df_sub[df_sub['id'].str.contains('_study')].copy()

print(f"Found {len(df_test_img)} test images.")
print(f"Found {len(df_test_study)} test studies.")
display(df_test_img.head())

Device: cuda
Tuned Confidence Threshold: 0.1
Tuned Negative Filter Threshold: 0.7
Found 638 test images.
Found 606 test studies.


  from .autonotebook import tqdm as notebook_tqdm


Unnamed: 0,id,PredictionString,StudyInstanceUID,image_id,image_path
606,004cbd797cd1_image,none 1 0 0 1 1,004cbd797cd1,004cbd797cd1,test_png_3ch/004cbd797cd1.png
607,008ca392cff3_image,none 1 0 0 1 1,008ca392cff3,008ca392cff3,test_png_3ch/008ca392cff3.png
608,00b8180bd3a8_image,none 1 0 0 1 1,00b8180bd3a8,00b8180bd3a8,test_png_3ch/00b8180bd3a8.png
609,00e3a7e91a34_image,none 1 0 0 1 1,00e3a7e91a34,00e3a7e91a34,test_png_3ch/00e3a7e91a34.png
610,0124f624dacb_image,none 1 0 0 1 1,0124f624dacb,0124f624dacb,test_png_3ch/0124f624dacb.png


# --- Part 2: Classifier Inference (ABANDONED) ---

**ACTION:** The `EfficientNet-B5` models are causing persistent, unrecoverable CUDA OOM errors, even with FP16 and a batch size of 1. With less than 15 minutes remaining, debugging this is not feasible.

**PIVOT:** Following expert advice, I am abandoning the classifier for test-time inference. I will proceed with the following plan:
1.  **Detector Inference:** Use the 5-fold ensembled `YOLOv5s` models to generate bounding box predictions. This component is stable.
2.  **Heuristic Study Predictions:** Instead of using a classifier, generate study-level predictions using a simple heuristic based on the number of bounding boxes found for each study.
3.  **Post-Processing & Submission:** Combine the detector predictions and heuristic-based study predictions into a final `submission.csv`.

In [2]:
# --- Part 3: Detector Inference (CPU FALLBACK) ---

print("--- Starting Detector Inference (CPU FALLBACK, 1-FOLD ONLY) ---")

# Config for detector
DET_MODEL_PATHS = sorted(glob.glob('yolov5_runs/train_cv/yolov5s_fold*/weights/best.pt'))
DET_IMG_SIZE = 640
DET_BATCH_SIZE = 16
TEST_IMAGE_DIR_1CH = 'test_png/'
CPU_DEVICE = 'cpu' # Explicitly use CPU as GPU is unstable

# Get test image paths
test_image_paths = sorted(glob.glob(f'{TEST_IMAGE_DIR_1CH}/*.png'))
print(f"Found {len(test_image_paths)} test images for detection.")

all_det_preds = []

# ONLY RUN 1 FOLD ON CPU DUE TO TIME CONSTRAINTS
for fold, path in enumerate(DET_MODEL_PATHS[:1]):
    print(f"\n--- Processing Fold {fold} with model {path} on CPU ---")
    
    model = torch.hub.load(
        'ultralytics/yolov5',
        'custom',
        path=path,
        force_reload=True, # Try to clear cache issues
        _verbose=False
    )
    model.to(CPU_DEVICE).eval()
    model.conf = TUNED_CONF_THRESHOLD 
    
    with torch.no_grad():
        for i in tqdm(range(0, len(test_image_paths), DET_BATCH_SIZE), desc=f"Fold {fold} Detection (CPU)"):
            batch_paths = test_image_paths[i:i+DET_BATCH_SIZE]
            results = model(batch_paths, size=DET_IMG_SIZE)
            preds_df_list = results.pandas().xyxy
            
            for j, preds_df in enumerate(preds_df_list):
                if not preds_df.empty:
                    image_id = os.path.basename(batch_paths[j]).replace('.png', '')
                    for _, row in preds_df.iterrows():
                        all_det_preds.append({
                            'image_id': image_id,
                            'x_min': row['xmin'],
                            'y_min': row['ymin'],
                            'x_max': row['xmax'],
                            'y_max': row['ymax'],
                            'confidence': row['confidence']
                        })

    del model
    gc.collect()

df_det_preds_raw = pd.DataFrame(all_det_preds if all_det_preds else [])
df_det_preds_raw.to_csv('test_preds_detector.csv', index=False)

print("\nDetector inference complete. Raw box predictions saved.")
display(df_det_preds_raw.head())

--- Starting Detector Inference (CPU FALLBACK, 1-FOLD ONLY) ---
Found 638 test images for detection.

--- Processing Fold 0 with model yolov5_runs/train_cv/yolov5s_fold0/weights/best.pt on CPU ---


Downloading: "https://github.com/ultralytics/yolov5/zipball/master" to /app/.cache/torch/hub/master.zip


Exception: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
. Cache may be out of date, try `force_reload=True` or see https://docs.ultralytics.com/yolov5/tutorials/pytorch_hub_model_loading for help.

In [3]:
# --- Part 4: Final Fallback - No-Inference Submission ---

print("CRITICAL: All inference attempts failed due to persistent OOM. Pivoting to no-inference submission.")

# --- Step 1: Generate Heuristic Study Predictions (from training set priors) ---
df_train_study = pd.read_csv('train_study_level.csv')
df_train_study['id'] = df_train_study['id'].str.replace('_study', '')

cols = ['Negative for Pneumonia', 'Typical Appearance', 'Indeterminate Appearance', 'Atypical Appearance']
priors = df_train_study[cols].mean().values

print(f"Calculated Priors from Training Data:")
for cls, p in zip(cols, priors):
    print(f"  - {cls}: {p:.4f}")

study_preds_list = []
for study_id in df_test_study['StudyInstanceUID'].unique():
    pred_strings = []
    # Format: lowercase class name + probability
    pred_strings.append(f"negative {priors[0]:.4f} 0 0 1 1")
    pred_strings.append(f"typical {priors[1]:.4f} 0 0 1 1")
    pred_strings.append(f"indeterminate {priors[2]:.4f} 0 0 1 1")
    pred_strings.append(f"atypical {priors[3]:.4f} 0 0 1 1")
    
    study_preds_list.append({
        'id': f"{study_id}_study",
        'PredictionString': " ".join(pred_strings)
    })
df_study_sub = pd.DataFrame(study_preds_list)

# --- Step 2: Generate 'None' Image Predictions ---
image_preds_list = []
for image_id in df_test_img['image_id'].unique():
    image_preds_list.append({
        'id': f"{image_id}_image",
        'PredictionString': 'none 1 0 0 1 1'
    })
df_image_sub = pd.DataFrame(image_preds_list)

# --- Step 3: Combine and Create Final Submission File ---
df_submission = pd.concat([df_study_sub, df_image_sub], ignore_index=True)

# Reorder to match sample submission
sample_sub = pd.read_csv('sample_submission.csv')
df_submission = df_submission.set_index('id').reindex(sample_sub['id']).reset_index()

df_submission.to_csv('submission.csv', index=False)

print("\nFinal submission.csv created using training priors and no bounding boxes.")
display(df_submission.head())
display(df_submission.tail())
print(f"Total rows in submission: {len(df_submission)}")

CRITICAL: All inference attempts failed due to persistent OOM. Pivoting to no-inference submission.
Calculated Priors from Training Data:
  - Negative for Pneumonia: 0.2740
  - Typical Appearance: 0.4708
  - Indeterminate Appearance: 0.1747
  - Atypical Appearance: 0.0804

Final submission.csv created using training priors and no bounding boxes.


Unnamed: 0,id,PredictionString
0,000c9c05fd14_study,negative 0.2740 0 0 1 1 typical 0.4708 0 0 1 1...
1,00c74279c5b7_study,negative 0.2740 0 0 1 1 typical 0.4708 0 0 1 1...
2,00ccd633fb0e_study,negative 0.2740 0 0 1 1 typical 0.4708 0 0 1 1...
3,00e936c58da6_study,negative 0.2740 0 0 1 1 typical 0.4708 0 0 1 1...
4,01206a422293_study,negative 0.2740 0 0 1 1 typical 0.4708 0 0 1 1...


Unnamed: 0,id,PredictionString
1239,ff03d1d41968_image,none 1 0 0 1 1
1240,ff0743bee789_image,none 1 0 0 1 1
1241,ffab0f8f27f0_image,none 1 0 0 1 1
1242,ffbeafe30b77_image,none 1 0 0 1 1
1243,ffe942c8655f_image,none 1 0 0 1 1


Total rows in submission: 1244
