# Vesuvius Challenge - Ink Detection: Medal-Winning Plan

This notebook outlines a refined plan to tackle the Vesuvius Challenge, incorporating expert advice to target a medal-winning F0.5 score.

## 1. Initial Setup & Focused EDA
*   **Goal:** Quickly understand the data and set up the project structure.
*   **Actions:**
    1.  List file structure of `train/` and `test/` to confirm fragments.
    2.  Load a sample fragment: `mask.png`, `ir.png`, and the `surface_volume/` TIFs.
    3.  **Key EDA Task:** Plot ink presence (`mask.png`) against the z-slice index to identify the most informative slice range for our 2.5D model. This will determine which slices to stack.
    4.  **ROI Definition:** The `mask.png` defines the Region of Interest (ROI). All training, validation, and inference will be strictly confined to this mask. We will crop images to the bounding box of the mask to save compute.
    5.  Verify `sample_submission.csv` format.

## 2. Data Pipeline (2.5D Approach)
*   **Input Representation:** Treat the 3D volume as a multi-channel 2D image.
    *   **Slices:** Stack `N=24` adjacent slices from the surface volume (e.g., from z-index `i-12` to `i+11`). The exact range will be informed by EDA.
    *   **IR Channel:** Add the `ir.png` as an additional channel.
    *   **Total Channels:** `24 (slices) + 1 (IR) = 25` input channels.
*   **Tiling Strategy:**
    *   **Tile Size:** `320x320` with 50% overlap for both training and inference.
    *   **Sampling:** Create a custom PyTorch `Dataset` that generates tiles.
        *   **ROI-Only:** Only sample tiles that are within the `mask.png` ROI.
        *   **Balanced Sampling:** To combat class imbalance, sample ~50% of tiles that contain ink pixels and ~50% that do not (but are still within the ROI).
*   **Normalization:**
    *   **Per-Fragment:** Normalize each fragment independently. Do not use global statistics.
    *   **Method:** Use robust percentile scaling. Clip values to the [1%, 99.5%] range, then scale to `[0, 1]`. Convert 16-bit TIFs to `float32`.
*   **Augmentations:**
    *   **Geometric:** Horizontal/Vertical flips, 90-degree rotations.
    *   **Photometric (light):** Random brightness/contrast.

## 3. Model & Training
*   **Framework:** PyTorch with `segmentation-models-pytorch` (SMP).
*   **Architecture:** `FPN` (Feature Pyramid Network) or `U-Net`.
*   **Backbone:** `timm-tf_efficientnetv2_s` (recommended) or `timm-efficientnet-b4`.
*   **Loss Function:** Start with a 50/50 combination of `BCEWithLogitsLoss` and `DiceLoss`. Consider upgrading to Focal Tversky loss for better F0.5 optimization.
*   **Optimizer:** AdamW (e.g., `lr=3e-4`, `weight_decay=1e-4`).
*   **Scheduler:** `CosineAnnealingLR` with a warmup phase.
*   **Hardware Optimization:** Use Automatic Mixed Precision (AMP) to accelerate training and reduce memory usage.

## 4. Validation Strategy
*   **Method:** Leave-One-Fragment-Out Cross-Validation (3 folds). Train on two fragments, validate on the third.
*   **Metric:** Calculate the F0.5 score on the validation set. **Crucially, the metric should only be computed on pixels within the validation fragment's ROI mask.**
*   **Model Selection:** Use early stopping based on the validation F0.5 score, saving the checkpoint with the best performance.

## 5. Inference & Post-processing
*   **Tiling & Blending:** Perform inference on overlapping tiles (`320x320` or `512x512`) and stitch the predictions using a smooth blending function (e.g., Gaussian or cosine weights) to eliminate seam artifacts.
*   **Test-Time Augmentation (TTA):** Apply flips (horizontal, vertical) for a 4x TTA. Average the resulting *logits* before applying the sigmoid function.
*   **Z-Ensembling:** (Optional, if time permits) Predict on slightly shifted z-stacks (e.g., centered at `z-2`, `z`, `z+2`) and average the logits.
*   **ROI Masking:** **Multiply the final probability map by the test fragment's `mask.png`** to zero out any predictions outside the valid area.
*   **Thresholding:** **Tune the probability threshold per fragment.** Search for the optimal threshold (e.g., in the 0.35-0.75 range) on the validation set to maximize the F0.5 score.
*   **Cleaning:** Apply post-processing to the binary mask. **Remove small connected components** (e.g., fewer than 20-100 pixels). Tune this size threshold on the validation set.

## 6. Submission
*   **Encoding:** Convert the final, cleaned binary masks into Run-Length Encoding (RLE) format.
*   **Verification:** Double-check the RLE output against the `sample_submission.csv` format to ensure correctness (row-major order, `Id` format).
*   **Ensembling:** For a final push, average the logits from the models trained on each of the 3 folds before post-processing.

In [1]:
import sys
import subprocess
import importlib

# The environment's !pip magic is broken. Using subprocess directly.
command = [
    sys.executable,
    '-m', 'pip', 'install', '-q',
    'opencv-python-headless',
    'segmentation-models-pytorch',
    'timm',
    'albumentations'
]

result = subprocess.run(command, capture_output=True, text=True)

if result.returncode == 0:
    print("✅ Packages installed successfully.")
    # Manually invalidate import caches to ensure new packages are discoverable
    importlib.invalidate_caches()
else:
    print("❌ Package installation failed.")
    print("--- stdout ---")
    print(result.stdout)
    print("--- stderr ---")
    print(result.stderr)

✅ Packages installed successfully.
Error in callback <function _enable_matplotlib_integration.<locals>.configure_once at 0x7cc28c63b560> (for post_run_cell), with arguments args (<ExecutionResult object at 7cc282385710, execution_count=14 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7cc282420b10, raw_cell="import sys
import subprocess
import importlib

# T.." transformed_cell="import sys
import subprocess
import importlib

# T.." store_history=True silent=False shell_futures=True cell_id=None> result=None>,),kwargs {}:


AttributeError: module 'matplotlib' has no attribute 'backend_bases'

In [2]:
import os
import glob
import numpy as np
import pandas as pd
import cv2
from PIL import Image
# import matplotlib
# matplotlib.use('Agg') # Commenting out to see if default backend works
import matplotlib.pyplot as plt
from tqdm import tqdm # Use standard tqdm

# --- 1. Initial Setup & File Exploration ---

TRAIN_PATH = 'train/'
TEST_PATH = 'test/'

train_fragments = sorted(os.listdir(TRAIN_PATH))
test_fragments = sorted(os.listdir(TEST_PATH))

print(f"Training fragments: {train_fragments}")
print(f"Test fragments: {test_fragments}")

# Let's inspect the first training fragment
fragment_id = train_fragments[0]
fragment_path = os.path.join(TRAIN_PATH, fragment_id)

# Load mask (ink labels)
mask_path = os.path.join(fragment_path, 'inklabels.png')
mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE)

# Load IR image
ir_path = os.path.join(fragment_path, 'ir.png')
ir_image = cv2.imread(ir_path, cv2.IMREAD_GRAYSCALE)

# Load surface volume slices (TIF files)
surface_volume_path = os.path.join(fragment_path, 'surface_volume')
slice_paths = sorted(glob.glob(os.path.join(surface_volume_path, '*.tif')))
num_slices = len(slice_paths)

print(f"\n--- Inspecting Fragment '{fragment_id}' --- ")
print(f"Mask shape: {mask.shape}, dtype: {mask.dtype}")
print(f"IR image shape: {ir_image.shape}, dtype: {ir_image.dtype}")
print(f"Number of slices in surface volume: {num_slices}")

# Load one slice to check its properties
if num_slices > 0:
    sample_slice = cv2.imread(slice_paths[0], cv2.IMREAD_UNCHANGED)
    print(f"Sample slice shape: {sample_slice.shape}, dtype: {sample_slice.dtype}")
    print(f"Sample slice value range: [{np.min(sample_slice)}, {np.max(sample_slice)}]")

Training fragments: ['1', '2']
Test fragments: ['a']



--- Inspecting Fragment '1' --- 
Mask shape: (8181, 6330), dtype: uint8
IR image shape: (8181, 6330), dtype: uint8
Number of slices in surface volume: 65
Sample slice shape: (8181, 6330), dtype: uint16
Sample slice value range: [0, 65535]
Error in callback <function _enable_matplotlib_integration.<locals>.configure_once at 0x7cc28c63b560> (for post_run_cell), with arguments args (<ExecutionResult object at 7cc277507e90, execution_count=15 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7cc28c571410, raw_cell="import os
import glob
import numpy as np
import pa.." transformed_cell="import os
import glob
import numpy as np
import pa.." store_history=True silent=False shell_futures=True cell_id=None> result=None>,),kwargs {}:


AttributeError: module 'matplotlib' has no attribute 'backend_bases'

In [3]:
# --- 2. EDA: Find Informative Z-Slice Range (Calculation Only) ---

print("Starting EDA to find informative z-slice range...")

# The matplotlib installation is corrupted, but the data loading in the previous cell worked.
# We will proceed with the calculations and print the results, but comment out the plotting.

# Convert mask to boolean and find where the ink is
ink_mask = mask > 0
ink_pixels_count = np.sum(ink_mask)

if ink_pixels_count == 0:
    print(f"Warning: No ink pixels found in the mask for fragment {fragment_id}.")
else:
    print(f"Found {ink_pixels_count} ink pixels in fragment {fragment_id}.")
    mean_ink_intensities = []
    
    for slice_path in tqdm(slice_paths, desc=f"Analyzing slices for fragment {fragment_id}"):
        slice_img = cv2.imread(slice_path, cv2.IMREAD_UNCHANGED)
        ink_intensity = slice_img[ink_mask].mean()
        mean_ink_intensities.append(ink_intensity)

    # --- Plotting is disabled due to environment errors ---
    # plt.figure(figsize=(15, 7))
    # plt.plot(range(num_slices), mean_ink_intensities, marker='o', linestyle='-')
    # plt.title(f'Mean Intensity under Ink Mask vs. Z-Slice Index (Fragment {fragment_id})')
    # plt.xlabel('Slice Index (Z-depth)')
    # plt.ylabel('Mean Pixel Intensity')
    # plt.grid(True)

    best_slice_idx = np.argmax(mean_ink_intensities)
    z_center = num_slices // 2
    z_start = z_center - 12
    z_end = z_center + 12
    
    print("\n--- EDA Results ---")
    print(f"Total slices available: {num_slices}")
    print(f"Slice with highest intensity under mask: {best_slice_idx}")
    print(f"Default centered slice range for 24 slices: {z_start} to {z_end-1}")
    print("Based on this, the middle slices seem to be a good starting point as per the plan.")

Starting EDA to find informative z-slice range...
Found 5339362 ink pixels in fragment 1.


Analyzing slices for fragment 1:   0%|          | 0/65 [00:00<?, ?it/s]

Analyzing slices for fragment 1:   3%|▎         | 2/65 [00:00<00:03, 16.36it/s]

Analyzing slices for fragment 1:   6%|▌         | 4/65 [00:00<00:03, 16.39it/s]

Analyzing slices for fragment 1:   9%|▉         | 6/65 [00:00<00:03, 16.34it/s]

Analyzing slices for fragment 1:  12%|█▏        | 8/65 [00:00<00:03, 16.37it/s]

Analyzing slices for fragment 1:  15%|█▌        | 10/65 [00:00<00:03, 16.36it/s]

Analyzing slices for fragment 1:  18%|█▊        | 12/65 [00:00<00:03, 16.42it/s]

Analyzing slices for fragment 1:  22%|██▏       | 14/65 [00:00<00:03, 16.42it/s]

Analyzing slices for fragment 1:  25%|██▍       | 16/65 [00:00<00:02, 16.41it/s]

Analyzing slices for fragment 1:  28%|██▊       | 18/65 [00:01<00:02, 16.42it/s]

Analyzing slices for fragment 1:  31%|███       | 20/65 [00:01<00:02, 16.42it/s]

Analyzing slices for fragment 1:  34%|███▍      | 22/65 [00:01<00:02, 16.41it/s]

Analyzing slices for fragment 1:  37%|███▋      | 24/65 [00:01<00:02, 16.42it/s]

Analyzing slices for fragment 1:  40%|████      | 26/65 [00:01<00:02, 16.41it/s]

Analyzing slices for fragment 1:  43%|████▎     | 28/65 [00:01<00:02, 16.40it/s]

Analyzing slices for fragment 1:  46%|████▌     | 30/65 [00:01<00:02, 16.42it/s]

Analyzing slices for fragment 1:  49%|████▉     | 32/65 [00:01<00:02, 16.43it/s]

Analyzing slices for fragment 1:  52%|█████▏    | 34/65 [00:02<00:01, 16.41it/s]

Analyzing slices for fragment 1:  55%|█████▌    | 36/65 [00:02<00:01, 16.43it/s]

Analyzing slices for fragment 1:  58%|█████▊    | 38/65 [00:02<00:01, 16.41it/s]

Analyzing slices for fragment 1:  62%|██████▏   | 40/65 [00:02<00:01, 16.38it/s]

Analyzing slices for fragment 1:  65%|██████▍   | 42/65 [00:02<00:01, 16.40it/s]

Analyzing slices for fragment 1:  68%|██████▊   | 44/65 [00:02<00:01, 16.42it/s]

Analyzing slices for fragment 1:  71%|███████   | 46/65 [00:02<00:01, 16.42it/s]

Analyzing slices for fragment 1:  74%|███████▍  | 48/65 [00:02<00:01, 16.42it/s]

Analyzing slices for fragment 1:  77%|███████▋  | 50/65 [00:03<00:00, 16.42it/s]

Analyzing slices for fragment 1:  80%|████████  | 52/65 [00:03<00:00, 16.37it/s]

Analyzing slices for fragment 1:  83%|████████▎ | 54/65 [00:03<00:00, 16.38it/s]

Analyzing slices for fragment 1:  86%|████████▌ | 56/65 [00:03<00:00, 16.38it/s]

Analyzing slices for fragment 1:  89%|████████▉ | 58/65 [00:03<00:00, 16.39it/s]

Analyzing slices for fragment 1:  92%|█████████▏| 60/65 [00:03<00:00, 16.42it/s]

Analyzing slices for fragment 1:  95%|█████████▌| 62/65 [00:03<00:00, 16.42it/s]

Analyzing slices for fragment 1:  98%|█████████▊| 64/65 [00:03<00:00, 16.38it/s]

Analyzing slices for fragment 1: 100%|██████████| 65/65 [00:03<00:00, 16.40it/s]





--- EDA Results ---
Total slices available: 65
Slice with highest intensity under mask: 28
Default centered slice range for 24 slices: 20 to 43
Based on this, the middle slices seem to be a good starting point as per the plan.
Error in callback <function _enable_matplotlib_integration.<locals>.configure_once at 0x7cc28c63b560> (for post_run_cell), with arguments args (<ExecutionResult object at 7cc28c265d90, execution_count=16 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7cc28c266f90, raw_cell="# --- 2. EDA: Find Informative Z-Slice Range (Calc.." transformed_cell="# --- 2. EDA: Find Informative Z-Slice Range (Calc.." store_history=True silent=False shell_futures=True cell_id=None> result=None>,),kwargs {}:


AttributeError: module 'matplotlib' has no attribute 'backend_bases'