Skip to content

nisarahmedrana/BIQ2021

Repository files navigation

BIQ2021

A Large-Scale Blind Image Quality Assessment Database

Overview

This repository provides BIQ2021 database of images with Mean Opinion Score (MOS) for each image. The datset is partioned into three subsets as discussed in "A Large-Scale Blind Image Quality Assessment Database". The MOS for the images is scaled to a range of 0-1 whereas the images in the database are divided into three categories according to type of content covered by these images. Altough, the images in the database are divided in three categores but their use for training and validation doesn't distinguish an image on the basis of its content category. The name of an image contain ss01, ss02, ss03 which is for SubSet-01, SubSet-02 and SubSet-03 respectively.

Subset-01

The first subset contains 2000 images chosen from an image gallery of images captured by Nisar Ahmed between 2007 and 2020. These images have varying degrees and types of distortions captured by various image acquisition devices. These images have distortions due to camera and photographic error, as well as distortions introduced during processing and storage. Because these images were not captured for the purpose of IQA, they can serve as a true representative for evaluating IQA algorithms.
imagename Figure 1 Randomly selected images from subset-01

Subset-02

The second subset contains images that were taken with the intention of being used for IQA. This subset contains 2000 images, and it ensures that the entire spectrum of quality scoring is covered by introducing images ranging from the worst to the best. The distortions in this subset were intentionally introduced during the acquisition process, similar to CID2013, but we did not use fixed levels of distortion. The images are captured by changing the International Organization for Standardization (ISO) from 50 to 3200 and the shutter speed from 4 to 1/1250 s. Auto- and manual focuses are used for acquisition, and lens blur and motion blur are introduced on purpose during acquisition. To introduce the effect of ambient light when photographing an indoor environment, the ambient light is changed. Because there are very few images with extreme distortion levels, this subset was created to balance different distortion levels.
imagename Figure 1 Randomly selected images from subset-02

Subset-03

The third subset of 8000 images is acquired from Unsplash.com, where use of images for scientific or commercial purposes is allowed. These downloaded images are searched for using various keywords to introduce diversity of content and are specifically chosen for the purpose of IQA. The keywords used for search are animals, wildlife, pets, birds, zoo, vegetables, fruits, food, cooking, architecture, cityscape, night, indoor, outdoor, scenery, mountain, lake, candid, close-up, experimental, texture, people, men, women, model, kids, babies, boy, girl, fashion, culture, vintage, sports, and swimming. It should be noted that many of these images have been postprocessed, making them an excellent candidate for learning the effect of postprocessing on perceptual quality. Furthermore, because it contains images with a wide range of content, this subset of images contributes to the database’s diversity.
imagename Figure 1 Randomly selected images from subset-03

Mean Opinion Score (MOS)

Image Quality Assessment (IQA) is performed by supplying a quality score for each image which is subsequently used for supervised training (regression). These quality scores are obtained from human subjects (observers) who rate the images on a scale from 1 to 5 in terms of "excellent", "good", "fair", "bad" and "very bad". ITU-T P.910 provide recommendation for absolute quality rating with discrete scale and state that up 30 subjects with diverse background will provide a reliable judgment. The experiments are conducted in laboratory environment under the supervision of the authors and quality ratings from 30 observers are averaged to obtain MOS.

Train-Test Split

In order to make the benchmarking of various approaches on the BIQ2021 dataset, it is imperative to use a consistant train-test split. For this purpose, the dataset is partioned into two splits with training dataset containing 10,000 images and testing dataset containing 2,000 images. It is to be noted that validation split is not provided with the data and it is upto the user to partion a suitable portion of data for validation, if required.

Model Evaluation

In order to evaluate the performance of an image quality assessment method on BIQ2021 dataset, Pearson's Linear Correlation Coefficient (PLCC) and Spearman's Rank Order Correlation Coefficient (SROCC) for various methods are reported. It is to be noted that the training of any of these models, if required, is performed on the training dataset and the reported results (PLCC & SROCC) for any method are obtained from testing dataset.

Sr. Technique/Model PLCC SROCC RMSE
1 BRISQUE 0.6941 0.6039 0.1588
2 NIQE 0.2981 0.2674 0.5402
3 PIQE 0.2088 0.1796 0.3935
4 ResNet-18 0.0000 0.0000 0.0000
5 ResNet-50 0.6862 0.6468 0.1457
6 ResNet-101 0.0000 0.0000 0.0000
7 MobileNet-V2 0.6613 0.6189 0.1643
8 DenseNet-201 0.6787 0.6364 0.1520
9 Inception-ResNet-V2 0.7002 0.6624 0.1328
10 Xception 0.6772 0.6369 0.1620
11 EfficientNet-b0 0.6143 0.5721 0.2100
12 Vgg16 0.6415 0.6001 0.1820
13 NASNet-Large 0.7083 0.6725 0.1259

Moreover, the codes used for training a CNN model on BIQ2021 are provided in the main directory for DenseNet201. It can be modified to be used for any pretrained model in MATLAB

Trained Model (InceptionResNet-V2): https://www.mathworks.com/matlabcentral/fileexchange/116410-inceptionresnetv2
Trained Model (Xception): https://www.mathworks.com/matlabcentral/fileexchange/116415-pre-trained-xception-on-koniq-10k
Trained Model (NASNet-Large): https://drive.google.com/file/d/1VAi6Kk5nka1ODByoB-yWgBlRw-PLyLHQ/view?usp=sharing

Comparison with Existing Methods

To enable effective comparison and benchmarking of image quality assessment methods on the BIQ2021 dataset, a comprehensive evaluation was conducted by Verga [1] and Ahmed et al. [2]. The evaluated methods, along with their performance metrics, are summarized below. The comparison is reported in terms of Pearson (PLCC) and Spearman's (SROCC) correlation coefficients, which are commonly used evaluation measures in the field. It is important to note that the evaluation was performed using the public train/test split provided with the dataset, ensuring a consistent and fair comparison

Sr. Method PLCC SROCC RMSE
1 BIQI 2010 0.564 0.564
2 BLIINDS-II 2012 0.496 0.496
3 BRISQUE 2012 0.603 0.603
4 DllVlNE 2012 0.617 0.617
5 NIQE 2012 0.356 0.356
6 Robust BRISQUE 2012 0.605 0.605
7 CurveletQA 2014 0.63 0.63
8 GM-LOG-BIQA 2014 0.617 0.617
9 SSEQ 2014 0.528 0.528
10 PIQE 2015 0.213 0.213
11 GWH-GLBP 2016 0.602 0.602
12 OG-IQA 2016 0.371 0.371
13 BMPRI 2018 0.494 0.494
14 ENIQA 2019 0.634 0.634
15 IL-NIQE 2019 0.461 0.461
16 NBIQA 2019 0.642 0.642
17 NASNet-Large 2021 0.7083 0.6725
18 PIQI 2021 0.6721 0.6698
20 DeepEns 2022 0.8098 0.7922
19 SGL-IQA 2023 0.71 0.71

About

A Large-Scale Blind Image Quality Assessment Database

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages