# 📸 Data Preprocessing

This notebook details the steps taken to preprocess the data for the project. The raw image pairs are stored in the folder `data/raw` and the cleaned data will be stored in the folder `data/processed`. Each sample is a pair of images that capture the same scence.

## Setup

---

Let's install some necessary dependencies and set global variables.

In [None]:
%reload_ext autoreload
%autoreload 2

In [None]:
# Autoroot
import autorootcwd

In [None]:
import cv2 as cv

# Imports
import numpy as np
from matplotlib import pyplot as plt
from tqdm import tqdm

# Local modules
from src.utils.load import load_image_pair, load_metadata
from src.utils.preprocess import keypoint_align, luminance_align

## Processing Pipeline

---

We load all images and perform the following processing steps (identical to steps in [processing.py](https://github.com/mikasenghaas/sillystill/blob/main/src/preprocess.py))

1) [Keypoint Alignment](https://github.com/mikasenghaas/sillystill/blob/main/notebooks/2.1-keypoint-align.ipynb): Transform and crop digital image to match scene based on keypoint detection and matching
2) [Luminance Alignment](https://github.com/mikasenghaas/sillystill/blob/main/notebooks/2.2-luminance-align.ipynb): Adjust luminance levels of digital and film image

In [None]:
# Load all metadata
meta = load_metadata()
image_pair_idxs = list(meta.keys())

print(f"✅ Loaded metadata of {len(meta)} image pairs")

In [None]:
# Seed OpenCV
cv.setRNGSeed(42)

In [None]:
for idx in image_pair_idxs:
    # Load raw image pair
    film, digital, _ = load_image_pair(idx, processing_state="raw", as_array=True)

    # 1) Keypoint alignment
    try:
        film, digital = keypoint_align(
            query=film,
            train=digital,
            extract_method="sift",
            match_method="flann",
        )
    except Exception as e:
        print(f"[ERROR] Keypoint alignment failed for image pair {idx}: {e}")
        continue

    # 2) Luminance alignment
    try:
        digital, film = luminance_align(template=digital, source=film)
    except Exception as e:
        print(f"[ERROR] Luminance alignment failed for image pair {idx}: {e}")
        continue

    # Plot processed images
    fig, axs = plt.subplots(ncols=2, figsize=(15, 5))
    fig.suptitle(f"Processed Image Pair {idx}", fontsize=16)
    axs[0].imshow(digital)
    axs[1].imshow(film)
    axs[0].set_title("Digital Image")
    axs[1].set_title("Film Image")
    plt.show()

Looks good. Let's call the processing script.

In [None]:
!python src/preprocess.py

## Inspect Proceessed Images

---

In [None]:
for idx in image_pair_idxs:
    # Load raw and processed image pair
    film_raw, digital_raw, _ = load_image_pair(idx, processing_state="raw", as_array=True)
    film_proc, digital_proc, _ = load_image_pair(idx, processing_state="processed", as_array=True)

    # Plot raw and processed images
    fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(15, 10))
    fig.suptitle(f"Image Pair {idx}", fontsize=16)
    axs[0, 0].imshow(digital_raw)
    axs[0, 1].imshow(digital_proc)
    axs[1, 0].imshow(film_raw)
    axs[1, 1].imshow(film_proc)
    axs[0, 0].set_title("Digital Image (Raw)")
    axs[0, 1].set_title("Digital Image (Processed)")
    axs[1, 0].set_title("Film Image (Raw)")
    axs[1, 1].set_title("Film Image (Processed)")
    plt.show()