# Image Characteristic Extraction for Image Deblurring Performance Analysis

- **Project**: Statistical Analysis of Image Deblurring Methods Performance
- **Dataset**: ~1000 high-resolution and diverse images from HQ-50K
- **Goal**: Extract image-level features (contrast and edge density) to enable statistical correlations between image characteristics, blur types, deblurring methods (classical and AI), and output quality metrics.

## Scope of this script:  
This module is part of the "Original Images Processing" phase and focuses on:  

- image_contrast: global contrast of the original image
- edge_density: edge detail intensity of the original image

These are **independent variables** in our dataset and are critical for:  

- understanding how image content affects perceived blur
- analyzing performance variation across deblurring methods
- grouping images by visual complexity and texture

## Chosen methods:

1. **RMS Contrast**:
   - **What it measures**: global variation in pixel intensity (grayscale)
   - **Why it's appropriate**: blur reduces intensity variance; RMS contrast is continuous, robust, and invariant to illumination changes
   - **Chosen over Michelson/local contrast due to**:
        - better stability across natural scenes
        - no assumption of foreground/background separation

2. **Sobel Gradient Magnitude (Edge Density)**:
   - What it measures: average strength of image gradients (edges)
   - Why it's appropriate: blur directly weakens gradient transitions; this metric quantifies edge loss
   - Chosen over Canny or binary edge maps because:
      - no threshold tuning
      - produces continuous values for statistical modeling
      - better suited for noisy or textured images
   - Gaussian smoothing is applied before Sobel to reduce spurious gradients

These features are extracted from **original high-resolution images** (before blur or deblurring), aligned with the project roadmap.

**Expected ranges**:
- **RMS contrast**: 0.1–0.3 for typical photos, >0.4 for highly contrasted scenes
- **Edge density**: 10–50 average gradient magnitude for natural images

**Output**: a `.parquet` file with columns *rms_contrast_grayscale*, *sobel_edge_density_grayscale* added to be used later in correlation analysis and performance prediction models.

In [1]:
import os
import cv2
import numpy as np
import pandas as pd
from tqdm import tqdm
import logging

# Gaussian filter to suppress noise before edge detection
GAUSSIAN_PARAMS = {
    'ksize': (3, 3),
    'sigma': 1.0
}

# Sobel kernel size
SOBEL_PARAMS = {
    'ksize': 3
}

# Configure logging
logging.basicConfig(
    filename='2_image_characteristic_extraction.log',
    level=logging.ERROR,
    format='%(asctime)s - %(levelname)s - %(message)s'
)


def compute_rms_contrast(image: np.ndarray) -> float:
    """
    Calculates Root Mean Square (RMS) contrast for a grayscale image.

    Args:
        image: Grayscale image as NumPy array

    Returns:
        float: RMS contrast value
    """
    img_float = image.astype(np.float32) / 255.0
    mean_intensity = np.mean(img_float)
    contrast = np.sqrt(np.mean((img_float - mean_intensity) ** 2))
    return contrast


def compute_sobel_edge_density(
    image: np.ndarray,
    gaussian_params: dict,
    sobel_params: dict
) -> float:
    """
    Calculates average gradient magnitude using Sobel filters.

    Args:
        image: Grayscale image as NumPy array
        gaussian_params: Gaussian blur settings
        sobel_params: Sobel kernel size

    Returns:
        float: Average gradient (edge strength)
    """
    img_blur = cv2.GaussianBlur(
        image,
        gaussian_params['ksize'],
        sigmaX=gaussian_params['sigma']
    )

    sobel_x = cv2.Sobel(
        img_blur,
        cv2.CV_64F,
        1, 0,
        ksize=sobel_params['ksize']
    )

    sobel_y = cv2.Sobel(
        img_blur,
        cv2.CV_64F,
        0, 1,
        ksize=sobel_params['ksize']
    )

    gradient_magnitude = np.sqrt(sobel_x**2 + sobel_y**2)
    return np.mean(gradient_magnitude)


def extract_features_from_dataset(
    dataset_parquet_path: str,
    dir_images_path: str,
    save: bool = True,
    subset: int = None,
    backup: bool = False
) -> pd.DataFrame:
    """
    Extracts image_contrast and edge_density from images listed in a dataset CSV.

    Args:
        dataset_parquet_path: .parquet file containing at least 'key' column
        dir_images_path: Directory where image files are stored
        save (optional): Boolean to define whether the script should save the updated dataset
        subset (optional): The size of the subset to process
        backup (optional): Boolean to define whether to create a backup of the dataset (dataset_parquet_path)

    Returns:
        Updated DataFrame from dataset_parquet_path with added rms_contrast, sobel_edge_density
    """
    df = pd.read_parquet(dataset_parquet_path)

    if backup:
        backup_path = dataset_parquet_path.with_name(f"bck_{dataset_parquet_path.name}")
        df.to_parquet(backup_path, index=False)

    df["rms_contrast"] = np.nan
    df["sobel_edge_density"] = np.nan
    df = df[:subset]

    for idx, row in tqdm(df.iterrows(), total=len(df), desc="Extracting features"):
        image_id = row['key']
        img_format = row['format'].lower() if 'format' in row and pd.notna(row['format']) else "png"

        ext = '.jpg' if img_format and img_format == 'jpeg' else f'.{img_format}'
        image_path = os.path.join(dir_images_path, f"{image_id}{ext}")

        if not os.path.exists(image_path):
            logging.error(f"File not found: {image_path}")
            continue

        img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)

        if img is None:
            logging.error(f"Failed to load image ID {image_id}")
            continue

        try:
            contrast = compute_rms_contrast(img)
            edge_density = compute_sobel_edge_density(
                img,
                GAUSSIAN_PARAMS,
                SOBEL_PARAMS
            )

            df.at[idx, "rms_contrast"] = contrast
            df.at[idx, "sobel_edge_density"] = edge_density

        except Exception as e:
            logging.error(f"Error processing image ID {image_id}: {e}")

    if save:
        df.to_parquet(dataset_parquet_path, index=False)

    print(f"\nFeature extraction complete.")
    return df

In [2]:
from pathlib import Path

ROOT_PATH = Path.cwd().parent.parent
DIR_DATASET_PATH =  ROOT_PATH / "data" / "image-deblurring-performance-analysis"
ORIGINAL_DATASET_PATH = DIR_DATASET_PATH / "original"
MAIN_DATASET_PATH = DIR_DATASET_PATH / "image_deblurring_dataset.parquet"

df = extract_features_from_dataset(
    dataset_parquet_path=MAIN_DATASET_PATH, 
    dir_images_path=ORIGINAL_DATASET_PATH / "00000",
    backup=True
)

Extracting features: 100%|██████████| 1250/1250 [01:37<00:00, 12.87it/s]


Feature extraction complete.





In [3]:
rms_contrast_max = df["rms_contrast"].max()
rms_contrast_min = df["rms_contrast"].min()
sobel_edge_density_max = df["sobel_edge_density"].max()
sobel_edge_density_min = df["sobel_edge_density"].min()

print(f"RMS Contrast - Max: {rms_contrast_max}, Min: {rms_contrast_min}")
print(f"Sobel Edge Density - Max: {sobel_edge_density_max}, Min: {sobel_edge_density_min}")
display(df)

RMS Contrast - Max: 0.43064895272254944, Min: 0.06042444705963135
Sobel Edge Density - Max: 203.72249217321888, Min: 8.266673319995519


Unnamed: 0,url,category,key,width,height,exif,aspect_ratio,size,rms_contrast,sobel_edge_density
0,http://100500foto.com/wp-content/uploads/2016/...,people,000000291,,,,,,,
1,http://2gfsl7am0og1m91u0pwpiehl.wpengine.netdn...,indoor_scene,000000987,,,,,,,
2,http://411posters.com/wp-content/uploads/2011/...,poster,000000382,1300.0,1728.0,{},0.752315,3169036.0,0.373365,54.414061
3,http://RealEstateAdminImages.gabriels.net/170/...,architecture,000000058,,,,,,,
4,http://RealEstateAdminImages.gabriels.net/170/...,architecture,000000001,,,,,,,
...,...,...,...,...,...,...,...,...,...,...
1245,https://www.yamaha.com/en/musical_instrument_g...,complex,000000564,1800.0,1042.0,{},1.727447,3199285.0,0.331964,60.298111
1246,https://www.yellowmaps.com/usgs/topomaps/drg24...,map,000001225,1509.0,2026.0,"{""Image Tag 0x5100"": ""0""}",0.744817,5742001.0,0.221090,99.311477
1247,https://www.zappos.com/images/z/2/5/1/8/8/7/25...,furniture,000000481,1920.0,1440.0,{},1.333333,4276346.0,0.253243,53.379880
1248,https://ycdn.space/h/2015/02/Capitol-Hill-Loft...,indoor_scene,000000954,1050.0,1575.0,{},0.666667,2000861.0,0.241754,64.152358
