In [None]:
import fiftyone as fo
import fiftyone.utils.huggingface as fouh

# Load the dataset from Hugging Face if it's your first time using it

# dataset = fouh.load_from_hub(
#     "Voxel51/Coursera_lecture_dataset_train", 
#     dataset_name="lecture_dataset_train", 
#     persistent=True
#     )

In [None]:
#because I have the dataset saved locally, I will load it like so
cloned_dataset = fo.load_dataset("lecture_dataset_train_clone")

You can use the [Image Quality Issues](https://github.com/jacobmarks/image-quality-issues) plugin in FiftyOne to find common issues in your image dataset.

With this plugin, you can find the following issues:

- **📏 Aspect ratio (`compute_aspect_ratio`):** find images with weird aspect ratios

- **🌫️ Blurriness (`compute_blurriness`):** find blurry images

- **☀️ Brightness (`compute_brightness`):** find bright and dark images

- **🌓 Contrast (`compute_contrast`):** find images with high or low contrast

- **🔀 Entropy (`compute_entropy`):** find images with low entropy

- **📸 Exposure (`compute_exposure`):** find overexposed and underexposed images

- **🕯️ Illumination (`compute_vignetting`):** find images with uneven illumination

- **🧂 Noise (`compute_salt_and_pepper`):** find images with high salt and pepper noise

- **🌈 Saturation (`compute_saturation`):** find images with low and high saturation

To make use of the plugin, you'll need to install it and install it's requirements:


Download the plugin:


In [None]:
from fiftyone import plugins

plugins.download_plugin(
    url_or_gh_repo="https://github.com/jacobmarks/image-quality-issues/"
)

In [None]:
plugins.list_downloaded_plugins()

Install the requirements:

In [None]:
plugins.install_plugin_requirements(
    plugin_name="@jacobmarks/image_issues"
)

Once you have the plugin and it's dependencies installed, you can use the plugin directly through the app or via the SDK.  When using the plugin via the SDK, you'll use it as an Operator. In FiftyOne, an Operator is a user-facing operation that allows you to interact with the data in your dataset. 

You access the opeartor via it's URL (plugin name + operator name):


In [None]:
import fiftyone.operators as foo

compute_brightness = foo.get_operator(
    "@jacobmarks/image_issues/compute_brightness"
)

Under the hood, the `compute_brightness` operator executes a function that takes an image and [calculates how bright it appears to the human eye](https://www.nbdtech.com/Blog/archive/2008/04/27/calculating-the-perceived-brightness-of-a-color.aspx). It does this by looking at the colors in the image and applying a formula that mimics how our eyes perceive brightness.

The function considers that our eyes are more sensitive to some colors than others. For example, we perceive green as brighter than blue, even if they have the same intensity. The function takes this into account when calculating the overall brightness of the image.

In simple terms, it's like the function is "squinting" at the image and giving it a single number that represents how bright the image looks overall. This can be useful for things like automatically adjusting image contrast or identifying images that might be too dark or too bright.

You can apply the `compute_brightness` operator to the entire dataset, like so:

In [None]:
compute_brightness(cloned_dataset)

You'll notice that the dataset now has a field called `brightness`.

In [None]:
cloned_dataset

Let's explore the dataset in the app: 

In [None]:
session = fo.launch_app(dataset=cloned_dataset)

You can also use the SDK to get some more insight:

In [None]:
cloned_dataset.bounds("brightness")

In [None]:
cloned_dataset.std("brightness")

In [None]:
cloned_dataset.mean("brightness")

In [None]:
cloned_dataset.quantiles("brightness", quantiles=[0.1, 0.5, 0.9])

You can create also construct a view of images based on some threshold value:

In [None]:
from fiftyone import ViewField as F

low_brightness_view = cloned_dataset.filter_field("brightness", F() < 0.3214)

You can also compute image quality metrics on the patch level:

In [None]:
compute_brightness(cloned_dataset, patches_field="ground_truth")

In [None]:
cloned_dataset.first()

You can then create a patches view of dataset and perform analysis on a detection level:

In [None]:
patches_view = cloned_dataset.to_patches("ground_truth", other_fields=True)

In [None]:
sunglasses_patches = patches_view.filter_labels("ground_truth", F("label")=="sunglasses")

In [None]:
fo.launch_app(sunglasses_patches)

### JPEG Compression

To compute a metric for quantifying JPEG compression.

We'll write code that exploits the fact that JPEG compression tends to create noticeable differences at the edges of 8x8 pixel blocks, especially at higher compression levels. By comparing these block-edge differences to general pixel differences across the image, it can estimate how much compression has been applied.

We can implement the following functions:

1. `compute_channel_metric`:
   This function analyzes a single color channel of an image to detect JPEG compression artifacts. It does this by:
   - Calculating average differences between adjacent pixels horizontally and vertically.
   - Calculating average differences between pixels at the edges of 8x8 blocks (JPEG uses 8x8 pixel blocks for compression).
   - Comparing these differences to detect the presence of blocking artifacts.
   - The function will work with images of any size, including those not divisible by 8.

2. `estimate_jpeg_quality`:
   This function estimates the overall JPEG compression level of an image by:
   - Reading the image and converting it to the YCrCb color space (Y: luminance, Cr and Cb: chrominance).
   - Applying the `compute_channel_metric` to each channel (Y, Cr, Cb) separately.
   - Combining these metrics with a weighted average, giving more importance to the luminance channel as it's more perceptually significant to human vision.

The resulting metric is a single floating-point number where higher values indicate more compression artifacts (and thus, lower image quality). This metric is relative and most useful for comparing different images or different versions of the same image, rather than as an absolute measure of quality.

- Lower values (closer to 0) suggest less compression and higher image quality.
- Higher values suggest more compression artifacts and lower image quality.
- Very low values (e.g., < 0.1) might indicate an image with very little compression.
- Very high values (e.g., > 1.5) might indicate heavy compression with noticeable artifacts.

Factors affecting the metric:

 - **Image content:** Smooth areas tend to compress better than areas with lots of detail.

 - **Original image quality:** Starting with a higher quality image generally results in a lower metric even after compression.

 - **Compression algorithm:** Different JPEG encoders might produce slightly different results.

In [None]:
import cv2
import numpy as np

def compute_channel_metric(channel):
    """
    Compute quality metric for a single channel, handling any image size.

    Args:
        channel (numpy.ndarray): 2D array representing an image channel.

    Returns:
        float: Quality metric for the channel.
    """
    height, width = channel.shape
    
    # Compute general pixel differences
    diff_h = np.abs(channel[:, 1:] - channel[:, :-1]).mean()
    diff_v = np.abs(channel[1:, :] - channel[:-1, :]).mean()
    
    # Compute block differences, adjusting for image size
    block_size = 8
    h_blocks = (height - 1) // block_size
    w_blocks = (width - 1) // block_size
    
    if h_blocks > 0 and w_blocks > 0:
        diff_h_block = np.abs(channel[:, block_size::block_size] - 
                              channel[:, block_size-1:-1:block_size]).mean()
        diff_v_block = np.abs(channel[block_size::block_size, :] - 
                              channel[block_size-1:-1:block_size, :]).mean()
    else:
        # Fallback for very small images
        diff_h_block = diff_h
        diff_v_block = diff_v
    
    # Compute relative differences
    rel_diff_h = diff_h_block / (diff_h + 1e-6)  # Avoid division by zero
    rel_diff_v = diff_v_block / (diff_v + 1e-6)
    
    return (rel_diff_h + rel_diff_v) / 2

def estimate_jpeg_quality(image_path):
    """
    Estimate the JPEG compression level of an image based on blocking artifacts,
    considering both luminance and chrominance information.

    Args:
        image_path (str): Path to the JPEG image file.

    Returns:
        float: A metric representing the estimated compression level.
               Higher values indicate more compression.
    """
    # Read the image in color
    img = cv2.imread(image_path)
    if img is None:
        raise ValueError(f"Unable to read image at {image_path}")

    # Convert to YCrCb color space
    img_ycrcb = cv2.cvtColor(img, cv2.COLOR_BGR2YCrCb)

    # Split into Y, Cr, and Cb channels
    y, cr, cb = cv2.split(img_ycrcb)

    # Compute metrics for each channel
    y_metric = compute_channel_metric(y)
    cr_metric = compute_channel_metric(cr)
    cb_metric = compute_channel_metric(cb)

    # Weighted average of channel metrics
    # We give more weight to luminance (Y) as it's more perceptually important
    final_metric = 0.6 * y_metric + 0.2 * cr_metric + 0.2 * cb_metric

    return final_metric

In [None]:
image_filepaths = cloned_dataset.values("filepath")

In [None]:
estimated_compression_scores = [estimate_jpeg_quality(fp) for fp in image_filepaths]

In [None]:
cloned_dataset.set_values("estimated_compression_score", estimated_compression_scores)

In [None]:
cloned_dataset

In [None]:
cloned_dataset.bounds("estimated_compression_score")

You can also use `find_issues` operator (only through the app), which allows you to designate images (or detections) as plagued by specific issues. 

You can run the issue-finding operator in single-issue or multi-issue mode, and can specify the threshold for each issue at the time of execution. 

All necessary computations which have not yet been run will be run.

In [None]:
fo.launch_app(cloned_dataset)

Notice that the images will have a tag indicating the issues present:


In [None]:
cloned_dataset.first()


If you ever need assistance, have more complex questions, or want to keep in touch, feel free to join the Voxel51 community Discord server [here](https://discord.gg/QAyfnUhfpw)