# Statistical Analysis of Image Deblurring Methods Performance Across Classical and AI-Based Approaches

## Problem statement

In the field of image processing, the degradation of image quality due to various types of distorions poses significant challenges for both human interpretation and machine learning applications. This project aims to systematically investigate blur image distortions by statistically analyzing relationships between parameters of the blurred images and metrics of the deblurring methods, both classical and AI-based. We will build the dataset by evaluating multiple AI models alongside classical deblurring methods applying various blur types to the real-world image datasets. 

The results will be analyzed statistically to identify the most effective deblurring method and to understand the relationship between blur types, image parameters, and method performance. This analysis will provide insights into the robustness of deblurring algorithms and All code and the dataset generated during this project will be open-sourced to foster further research and development in the field of image restoration.

## Description of the dataset

> 1. **HQ-50K Dataset**
>
>    - **Source:** HQ-50K is a large-scale dataset designed for image restoration tasks.
>    - **Number of Observations:** 50,000 high-quality images for training and 1,250 test images.
>    - **Variables per Observation:** Includes metadata such as texture details, semantic categories, and degradation levels.
>    - **Type of Variables:** Image resolution, blur intensity (quantitative), noise level (quantitative), and semantic category (categorical)[[paper]]> (https://arxiv.org/abs/2306.05390)[[github]](https://github.com/littleYaang/HQ-50K)[[huggingface]](https://huggingface.co/datasets/YangQiee/HQ-50K).
>
> 2. **MC-Blur Dataset**
>
>    - **Source:** A dataset specifically constructed for image deblurring with four types of blur: uniform, motion, heavy defocus, and real-world blurs.
>    - **Number of Observations:** Images collected from over 1,000 diverse scenes.
>    - **Variables per Observation:** Blur type (categorical), blur intensity (quantitative), scene type (categorical)[[github]](https://github.com/HDCVLab/MC-Blur-Dataset).
>
> 3. **LSDIR Dataset**
>    - **Source:** A large-scale dataset for image restoration tasks collected from Flickr.
>    - **Number of Observations:** 84,991 training images and 2,000 validation/test images.
>    - **Variables per Observation:** Noise level (quantitative), blur score (quantitative), resolution (quantitative)[[paper]](https://openaccess.thecvf.com/content/CVPR2023W/NTIRE/html/Li_LSDIR_A_Large_Scale_Dataset_for_Image_Restoration_CVPRW_2023_paper.html)[[github]](https://github.com/ofsoundof/LSDIR).


## Data cleanup

> - Address formatting errors in metadata or file paths.
> - Handle missing or repeated values in variables such as blur intensity or noise level.
> - Remove noisy observations that might distort analysis results.


## Preprocessing

> - Merge subsets if using multiple datasets (e.g., training and test sets).
> - Convert units where necessary (e.g., pixel intensity normalization).
> - Define derived variables like restoration error metrics or blur-to-noise ratios.


## Data exploration

> 1. Perform descriptive statistics on key variables:
>
> - Mean and standard deviation of blur intensities.
> - Distribution of noise levels across different categories.
>
> 2. Visualize data using:
>
> - Histograms for noise levels.
> - Scatterplots showing relationships between blur intensity and restoration error.


## Hypothesis testing

> Examples of hypotheses to test:
>
> - Does motion blur result in higher restoration errors compared to uniform blur?
> - Is there a significant difference in restoration accuracy between neural networks and classical methods?
> - Does higher noise level correlate with lower restoration accuracy?
>
> Steps:
>
> - Clearly state null and alternative hypotheses.
> - Perform t-tests or ANOVA where applicable.
> - Conduct power analysis to ensure sufficient sample size.


## Confidence intervals

> - Build confidence intervals for parameters like mean restoration error or average noise level.
> - Provide interpretations specific to the context of image deblurring.


## Linear regression and correlation analysis

> 1. Analyze correlations between variables:
>
> - Blur intensity vs. restoration error.
> - Noise level vs. restoration accuracy.
>
> 2. Build a multiple linear regression model:
>
> - Response Variable: Restoration error.
> - Predictors: Blur intensity, noise level, and blur type.
>
> 3. Validate assumptions using residual analysis.


## Conclusions

> Summarize findings such as:
>
> - Key factors influencing restoration accuracy.
> - Statistical evidence supporting the superiority of certain methods over others.


## Citations

- [1] https://arxiv.org/abs/2306.05390
- [2] https://openaccess.thecvf.com/content/CVPR2023W/NTIRE/html/Li_LSDIR_A_Large_Scale_Dataset_for_Image_Restoration_CVPRW_2023_paper.html
- [3] https://github.com/HDCVLab/MC-Blur-Dataset
- [4] https://github.com/littleYaang/HQ-50K
- [5] https://openaccess.thecvf.com/content/CVPR2023W/NTIRE/papers/Li_LSDIR_A_Large_Scale_Dataset_for_Image_Restoration_CVPRW_2023_paper.pdf
- [6] https://huggingface.co/datasets/YangQiee/HQ-50K
- [7] https://github.com/ofsoundof/LSDIR
- [8] https://paperswithcode.com/author/dongdong-chen
- [9] https://paperswithcode.com/datasets?task=image-restoration
- [10] https://www.kaggle.com/datasets/ravirajsinh45/real-life-industrial-dataset-of-casting-product
- [11] https://openreview.net/forum?id=6eoGVqMiIj
- [12] https://github.com/subeeshvasu/Awesome-Deblurring
- [13] https://paperswithcode.com/datasets?task=image-quality-assessment
- [14] https://paperswithcode.com/task/image-restoration/codeless
- [15] https://paperswithcode.com/dataset/gopro
- [16] https://www.researchgate.net/publication/349818385_A_Document_Image_Dataset_for_Quality_Assessment
- [17] https://github.com/Algolzw/daclip-uir
- [18] https://paperswithcode.com/task/image-restoration
- [19] http://people.ee.ethz.ch/~ihnatova/
- [20] https://data.vision.ee.ethz.ch/cvl/ntire19/
- [21] https://diec.unizar.es/intranet/articulos/uploads/Blurred%20Image%20Restoration%20Using%20the%20Type%20of%20Blur%20and%20Blur%20Parameters%20Identification%20on%20the%20Neural%20Network.pdf
- [22] https://www.semanticscholar.org/paper/Real-World-Blur-Dataset-for-Learning-and-Deblurring-Rim-Chwa/2d6c14023087b5d5bd90a88da13e0fa765418d84
- [23] https://www.researchgate.net/publication/371414316_HQ-50K_A_Large-scale_High-quality_Dataset_for_Image_Restoration
- [24] https://huggingface.co/ofsoundof/LSDIR
- [25] https://www.researchgate.net/publication/374207157_MC-Blur_A_Comprehensive_Benchmark_for_Image_Deblurring
- [26] https://openreview.net/revisions?id=Itug6LHDMR3
- [27] https://ieeexplore.ieee.org/document/10208419/
- [28] https://dl.acm.org/doi/abs/10.1109/TCSVT.2023.3319330
- [29] https://dblp.org/rec/journals/corr/abs-2306-05390
- [30] https://ieeexplore.ieee.org/iel7/10208270/10208119/10208419.pdf
- [31] https://paperswithcode.com/task/deblurring?page=4
- [32] https://www.arxiv-sanity-lite.com/?rank=pid&pid=2303.06994
- [33] https://www.computer.org/csdl/proceedings-article/cvprw/2023/024900b775/1PBxOctTkBy
- [34] https://arxiv.org/abs/2112.00234
- [35] https://paperswithcode.com/datasets?task=image-generation
- [36] https://www.semanticscholar.org/paper/05b09817b11cbfb3ba55b6fe8580fd488077d733
- [37] https://www.linkedin.com/pulse/your-data-science-projects-here-30-free-datasets-paresh-patil-qin6f
- [38] https://arxiv.org/html/2409.00768v1
- [39] https://www.researchgate.net/figure/Some-visualizations-of-the-test-results-on-the-MC-Blur-dataset-44-a-Input-images-b_fig8_377558441
- [40] https://www.lexjansen.com/nesug/nesug97/advtut/horwitz.pdf
- [41] https://stackoverflow.com/questions/68841814/viewing-dataset-in-rstudio-shows-different-number-of-observations-compared-to-r
- [42] https://www.nature.com/articles/s41598-023-47768-4
- [43] https://www.picsellia.com/post/image-data-quality-for-image-classification
- [44] https://arxiv.org/abs/2412.19479
- [45] https://supervisely.com/blog/dataset-quality-assurance-and-interactive-statistics/
- [46] https://communities.sas.com/t5/SAS-Data-Science/How-to-explore-a-dataset-with-over-3-million-rows/td-p/163318
- [47] https://www.statalist.org/forums/forum/general-stata-discussion/general/1680968-how-do-i-know-which-variables-uniquely-identify-each-observation-in-the-dataset-from-a-total-of-134-variables-in-my-dataset
- [48] https://pmc.ncbi.nlm.nih.gov/articles/PMC7227093/
- [49] https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/061-30.pdf
- [50] https://r4ds.hadley.nz/data-tidy.html
- [51] https://www.superannotate.com/blog/public-datasets-for-machine-learning
