# Phase 9: Final Stable Submission with Tuned Parameters

## Context
Previous attempts to generate a submission with the tuned hyperparameters (`conf_th=0.10`, `neg_th=0.70`) from notebook `07` failed due to catastrophic and persistent CUDA OOM errors. The environment appears to be unstable, preventing even single models from loading.

This notebook is the final attempt to generate a submission before the deadline. The strategy is designed for maximum stability and minimal memory footprint.

## Strategy: Sequential Inference

Instead of loading multiple models at once, we will process the classifier and detector stages sequentially, aggressively clearing memory at each step.

1.  **Classifier Stage:**
    *   Iterate through each of the 5 classifier model folds.
    *   For each fold: Load **one** model, run inference on the test set, store predictions, and then **delete the model and clear the CUDA cache**.
    *   Average the predictions from all 5 folds.
    *   Save the resulting study-level predictions to `study_preds.csv`.

2.  **Detector Stage:**
    *   **Restart the kernel** to ensure a completely clean memory state.
    *   Iterate through each of the 5 YOLOv5s detector model folds.
    *   For each fold: Load **one** model, run inference, store predictions, and then **delete the model and clear the CUDA cache**.
    *   Save the raw, unfiltered box predictions to `box_preds.csv`.

3.  **Assembly Stage:**
    *   Load the intermediate predictions from `study_preds.csv` and `box_preds.csv`.
    *   Apply the optimal post-processing thresholds found in notebook `07`:
        *   `confidence_threshold = 0.10`
        *   `negative_filter_threshold = 0.70`
    *   Format and generate the final `submission.csv`.

In [1]:
# --- STAGE 1: CLASSIFIER INFERENCE ---

import pandas as pd
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
import cv2
import timm
from tqdm import tqdm
import gc
import os

print('Stage 1: Starting Classifier Inference...')

# --- Config ---
TEST_IMAGE_DIR = 'test_png_3ch/'
MODEL_PATHS = [f'classifier_fold{i}_best.pth' for i in range(5)]
BATCH_SIZE = 8 # Reduced for stability
IMG_SIZE = 512
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
CLASSES = ['Negative for Pneumonia', 'Typical Appearance', 'Indeterminate Appearance', 'Atypical Appearance']

# --- Dataset ---
class TestDataset(Dataset):
    def __init__(self, df, image_dir):
        self.df = df
        self.image_dir = image_dir
    def __len__(self):
        return len(self.df)
    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        image_path = os.path.join(self.image_dir, row['id'].replace('_image', '.png'))
        image = cv2.imread(image_path)
        image = cv2.resize(image, (IMG_SIZE, IMG_SIZE))
        image = image.astype(np.float32) / 255.0
        image = image.transpose(2, 0, 1)
        return torch.from_numpy(image)

# --- Inference Loop ---
df_sub = pd.read_csv('sample_submission.csv')
df_test_study = df_sub[df_sub['id'].str.contains('_study')].copy()
df_test_study['image_id'] = df_test_study['id'].str.replace('_study', '')

df_test_img = pd.DataFrame({'id': os.listdir(TEST_IMAGE_DIR)})
df_test_img['id'] = df_test_img['id'].str.replace('.png', '_image')
test_dataset = TestDataset(df_test_img, TEST_IMAGE_DIR)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=2)

all_fold_preds = []

for fold, model_path in enumerate(MODEL_PATHS):
    print(f'--- Processing Fold {fold} ---')
    # Load model
    model = timm.create_model('efficientnet_b5', pretrained=False, num_classes=4)
    # Use FP16 and channels_last for stability, as suggested by expert
    model.load_state_dict(torch.load(model_path, map_location='cpu'))
    model = model.to(DEVICE).half().to(memory_format=torch.channels_last)
    model.eval()
    
    fold_preds = []
    with torch.no_grad():
        for images in tqdm(test_loader, desc=f'Fold {fold} Inference'):
            images = images.to(DEVICE).half().to(memory_format=torch.channels_last)
            outputs = model(images)
            fold_preds.append(torch.softmax(outputs, dim=1).cpu().numpy())
    
    all_fold_preds.append(np.concatenate(fold_preds))
    
    # Aggressively clean up memory
    del model, fold_preds
    gc.collect()
    torch.cuda.empty_cache()
    print(f'Fold {fold} complete. Memory cleared.')

# --- Ensemble and Save ---
avg_preds = np.mean(all_fold_preds, axis=0)
df_preds = pd.DataFrame(avg_preds, columns=CLASSES)
df_preds['id'] = df_test_img['id']

# Map image-level preds to study-level
df_train_meta = pd.read_csv('df_train_folds.csv')
df_test_meta = pd.read_csv('sample_submission.csv')
df_test_meta['StudyInstanceUID'] = df_test_meta['id'].apply(lambda x: x.split('_')[0])
df_train_meta['StudyInstanceUID'] = df_train_meta['StudyInstanceUID']
df_test_img_meta = df_train_meta[['image_id', 'StudyInstanceUID']].drop_duplicates()
df_preds['image_id'] = df_preds['id'].str.replace('_image', '')
df_preds = df_preds.merge(df_test_img_meta, on='image_id', how='left')

# Handle cases where test image is not in train meta (should not happen with provided data but good practice)
if df_preds['StudyInstanceUID'].isnull().any():
    print('Warning: Some test images could not be mapped to a StudyInstanceUID. Filling with image_id.')
    df_preds['StudyInstanceUID'] = df_preds['StudyInstanceUID'].fillna(df_preds['image_id'])

study_preds = df_preds.groupby('StudyInstanceUID')[CLASSES].mean().reset_index()
study_preds.to_csv('study_preds.csv', index=False)
print('\nStage 1 Complete: Classifier predictions saved to study_preds.csv')

  from .autonotebook import tqdm as notebook_tqdm


Stage 1: Starting Classifier Inference...
--- Processing Fold 0 ---


AcceleratorError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
