# Image Preprocessing - X-ray Bone Fracture Detection

This notebook demonstrates various preprocessing techniques using OpenCV.

## Steps:
1. Import libraries
2. Load sample images
3. Test preprocessing techniques
4. Compare methods
5. Build preprocessing pipeline
6. Process full dataset

## 1. Import Libraries

In [None]:
import sys
sys.path.append('..')

import numpy as np
import cv2
import matplotlib.pyplot as plt
from pathlib import Path

# Import our utilities
from utils.preprocess import XRayPreprocessor
from utils.data_loader import DatasetLoader
from utils.visualization import XRayVisualizer

%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')

print("✅ Libraries imported!")

## 2. Load Sample Image

In [None]:
# Load a sample fractured X-ray
loader = DatasetLoader('../data')
train_paths, train_labels = loader.load_data_paths('train')

# Get a fractured sample
fractured_indices = [i for i, label in enumerate(train_labels) if label == 1]
sample_path = train_paths[fractured_indices[0]] if fractured_indices else train_paths[0]

print(f"Sample image: {sample_path}")

In [None]:
# Load and display original
original = cv2.imread(sample_path, cv2.IMREAD_GRAYSCALE)

viz = XRayVisualizer()
viz.show_image(original, "Original X-ray")

print(f"Image shape: {original.shape}")
print(f"Data type: {original.dtype}")
print(f"Min pixel: {original.min()}")
print(f"Max pixel: {original.max()}")
print(f"Mean pixel: {original.mean():.2f}")

## 3. Test Individual Preprocessing Steps

### 3.1 Resizing

In [None]:
preprocessor = XRayPreprocessor(target_size=(224, 224))

resized = preprocessor.resize_image(original)
viz.compare_images(
    [original, resized],
    [f'Original {original.shape}', f'Resized {resized.shape}']
)

### 3.2 Denoising Techniques

In [None]:
# Test different denoising methods
gaussian_denoised = preprocessor.denoise_gaussian(resized)
bilateral_denoised = preprocessor.denoise_bilateral(resized)
nlm_denoised = preprocessor.denoise_nlm(resized)

viz.show_grid(
    [resized, gaussian_denoised, bilateral_denoised, nlm_denoised],
    ['Original', 'Gaussian', 'Bilateral', 'NLM Denoising'],
    rows=2, cols=2
)

### 3.3 Contrast Enhancement

In [None]:
# Test contrast enhancement methods
clahe_enhanced = preprocessor.enhance_contrast_clahe(resized)
hist_enhanced = preprocessor.enhance_contrast_histogram(resized)

viz.show_grid(
    [resized, clahe_enhanced, hist_enhanced],
    ['Original', 'CLAHE', 'Histogram Equalization'],
    rows=1, cols=3
)

In [None]:
# Compare histograms
viz.compare_histograms(
    [resized, clahe_enhanced, hist_enhanced],
    ['Original', 'CLAHE', 'Histogram Eq.'],
    title='Contrast Enhancement Comparison'
)

### 3.4 Sharpening

In [None]:
# Test sharpening
sharpened = preprocessor.sharpen_image(resized, strength=1.0)

viz.compare_images(
    [resized, sharpened],
    ['Original', 'Sharpened']
)

### 3.5 Normalization

In [None]:
# Test normalization
normalized_minmax = preprocessor.normalize_image(resized, method='minmax')
normalized_standard = preprocessor.normalize_image(resized, method='standard')

viz.show_grid(
    [resized, normalized_minmax, normalized_standard],
    ['Original', 'MinMax Normalized', 'Standard Normalized'],
    rows=1, cols=3
)

print("MinMax normalized range:", normalized_minmax.min(), "to", normalized_minmax.max())
print("Standard normalized range:", normalized_standard.min(), "to", normalized_standard.max())

## 4. Complete Preprocessing Pipeline

In [None]:
# Visualize the complete preprocessing pipeline
preprocessor.visualize_preprocessing_steps(sample_path)

## 5. Test Pipeline on Multiple Images

In [None]:
# Test on 5 random images
test_indices = np.random.choice(len(train_paths), 5, replace=False)
test_paths = [train_paths[i] for i in test_indices]

preprocessed_images = []
for path in test_paths:
    img = preprocessor.preprocess_full_pipeline(path)
    if img is not None:
        preprocessed_images.append(img)

# Display
viz.show_grid(
    preprocessed_images,
    [f'Image {i+1}' for i in range(len(preprocessed_images))],
    rows=1, cols=5
)

print(f"✅ Successfully preprocessed {len(preprocessed_images)} images")

## 6. Batch Processing Test

In [None]:
# Test batch processing on a small subset
subset_paths = train_paths[:100]  # First 100 images

print(f"Processing {len(subset_paths)} images...")
processed = preprocessor.preprocess_batch(subset_paths, show_progress=True)

print(f"\n✅ Processed {len(processed)}/{len(subset_paths)} images")
print(f"Success rate: {100*len(processed)/len(subset_paths):.1f}%")

## 7. Compare Original vs Preprocessed

In [None]:
# Load and compare
comparison_path = train_paths[0]
original_img = cv2.imread(comparison_path, cv2.IMREAD_GRAYSCALE)
processed_img = preprocessor.preprocess_full_pipeline(comparison_path)

# Resize processed for fair comparison
processed_display = (processed_img * 255).astype(np.uint8)

viz.compare_images(
    [original_img, processed_display],
    [f'Original\n{original_img.shape}', f'Preprocessed\n{processed_display.shape}']
)

## 8. Process Full Dataset (Optional)

In [None]:
# WARNING: This will process the entire dataset and may take a long time!
# Uncomment to run

# PROCESS_FULL_DATASET = False
# 
# if PROCESS_FULL_DATASET:
#     from utils.preprocess import preprocess_directory
#     
#     print("Processing training set...")
#     preprocess_directory(
#         input_dir='../data/train',
#         output_dir='../data/preprocessed/train',
#         target_size=(224, 224)
#     )
#     
#     print("\nProcessing validation set...")
#     preprocess_directory(
#         input_dir='../data/validation',
#         output_dir='../data/preprocessed/validation',
#         target_size=(224, 224)
#     )
#     
#     print("\n✅ Full dataset preprocessed!")
# else:
#     print("⚠️ Set PROCESS_FULL_DATASET = True to process the entire dataset")

print("⚠️ Uncomment the code above to process the full dataset")
print("This may take several hours depending on dataset size!")

## Summary

### Preprocessing Pipeline:
1. **Load Image** → Grayscale
2. **Remove Borders** → Remove black edges
3. **Resize** → 224x224 pixels
4. **Denoise** → Gaussian blur (kernel=5)
5. **Enhance** → CLAHE (clip_limit=2.0)
6. **Normalize** → [0, 1] range

### Key Findings:
- CLAHE works best for X-ray contrast enhancement
- Gaussian denoising is fast and effective
- Normalization to [0,1] prepares data for neural networks

### Next Steps:
1. Move to `03_data_augmentation.ipynb` for augmentation
2. Or proceed to model training with preprocessed data

### Notes:
- Preprocessing is consistent across all images
- Pipeline can be adjusted via config.py
- Save preprocessed images or process on-the-fly during training