Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric: CLIPIQA #348

Merged
merged 49 commits into from
Jun 5, 2023
Merged

Metric: CLIPIQA #348

merged 49 commits into from
Jun 5, 2023

Conversation

snk4tr
Copy link
Contributor

@snk4tr snk4tr commented Feb 28, 2023

This PR implements CLIPIQA metric described in Wang et. al. 2022.
Closes #331

The main reason to implement CLIP-IQA here is to give a user an opportunity to use the metric without bringing additional dependencies (mmcv/mmedit) to the project.

Note that CLIP-IQA+ won't be implemented here because CLIP weights were fine-tuned with mmcv and hence cannot be loaded and run without it.

Note that the values of this implementation correspond to the values produced by the official CLIP-IQA implementation. SRCC scores of evaluations on public benchmarks may mismatch the ones listed in the paper. We consider official code and weights to be the ultimate source of truth and hence stick with it.

Proposed Changes

  • Implementation of the main version of the metric (CLIP-IQA);
  • Evaluation on TID2013, KADID-10k, PIPAL, KonIQ-10k and LIVE-itW datasets (in results_benchmark.py);
  • Required tests. Note that values test on grayscale images are not provided because the official single image inference script does not support grayscale data;
  • Updates of minimal required Torchvision version in requirements and CI testing pipeline.

Some decisions that may be questioned in the future

  1. We do not use sets of prompt pairs that (as reported in the initial paper) may boost the performance of the initial CLIP-IQA metric. The reason is that the initial implementation does not use them. Hence, adding modification would mean proposing a new metric (modification of the initial one), which we try avoid here.
  2. We do not allow users to provide their own prompts because of the same reason.
  3. Tokens for prompt pairs are pre-computed. It allows us to avoid additional dependencies brought by tokenizer's code.
  4. We had to upgrade the minimal Torchvision version because CLIP models do not work with torchvision < 0.9.1. Also, tensors computed with newer versions cannot be loaded with the old ones.

@snk4tr snk4tr self-assigned this Feb 28, 2023
@snk4tr snk4tr added the feature New feature or request label Feb 28, 2023
@sonarcloud
Copy link

sonarcloud bot commented Mar 1, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 7 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@snk4tr snk4tr changed the title Metric: CLIPIQA and CLIPIQA+ Metric: CLIPIQA Apr 23, 2023
@snk4tr snk4tr requested review from denproc and zakajd May 31, 2023 12:15
@snk4tr
Copy link
Contributor Author

snk4tr commented May 31, 2023

Ready for re-review.

Copy link
Collaborator

@denproc denproc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments for discussion

piq/clip_iqa.py Show resolved Hide resolved
piq/clip_iqa.py Outdated Show resolved Hide resolved
piq/utils/common.py Show resolved Hide resolved
piq/feature_extractors/clip.py Outdated Show resolved Hide resolved
@snk4tr
Copy link
Contributor Author

snk4tr commented Jun 3, 2023

For some reason I cannot comment directly reply to this comment so I'll do it here. @denproc nice catch and this one actually let me guess with high probability why the original implementation had this type type conversion. It turns out that:

  1. Conv2d only for Half precision (i.e. float16) implemented only for CUDA operations. Hence, we cannot really support it here because we want to allow both CPU and GPU computation of all our metrics.
  2. There were initially two type conversions and if I remove both of them (including the one @denproc just mentioned), the abs difference in metric values per sample becomes noticeable (~1e-3).

As a result, I think it will be fair to allow computation of the metric only in float32 dtype. However, nothing really stops us from working on a copy of input tensor in order to not change the actual passed tensor type. I will add a comment about float32 to the code as well.

Copy link
Collaborator

@denproc denproc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have missed some of the entries of pos_embedding parameter. I think we should change it to True value as well to match the original clip behavioue.

piq/feature_extractors/clip.py Outdated Show resolved Hide resolved
piq/feature_extractors/clip.py Outdated Show resolved Hide resolved
piq/feature_extractors/clip.py Outdated Show resolved Hide resolved
piq/feature_extractors/clip.py Outdated Show resolved Hide resolved
@denproc
Copy link
Collaborator

denproc commented Jun 5, 2023

In addition, we have to add the CLIP-IQA into the documentation. This also makes me think if we have all our metrics covered in documentation. I might check it later.
UPD: Added #366

@snk4tr snk4tr requested a review from denproc June 5, 2023 17:05
@sonarcloud
Copy link

sonarcloud bot commented Jun 5, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Collaborator

@denproc denproc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@snk4tr snk4tr merged commit 01e16b7 into master Jun 5, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metric: CLIP-IQA and CLIP-IQA+
3 participants