Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mean absolute deviation for pooling #89

Merged
merged 1 commit into from
Jan 29, 2021
Merged

Use mean absolute deviation for pooling #89

merged 1 commit into from
Jan 29, 2021

Conversation

igv
Copy link
Contributor

@igv igv commented Dec 28, 2020

No description provided.

@kornelski
Copy link
Owner

That works great, but why does it have .powf(1/2^n)?

@kornelski
Copy link
Owner

That weight per scale score tweak hinted that perhaps the original scale weights were non-optimal. I've adjusted them, and that gave a decent improvement.

@igv
Copy link
Contributor Author

igv commented Dec 30, 2020

but why does it have .powf(1/2^n)?

My tweak. Initially it was .powf(0.5) (MAD used in MDSI metric and they set it to 0.25 (EDIT: I forgot, it's a different parameter in MDSI)), but then I found out that when comparing only at first scale (SSIM), the best accuracy is with .powf(1) (almost on par with VIF, try it), so I tested it with each scale separately.
It's called mean absolute deviation, but you can also use median or mode or your own weighted mean.

I've adjusted them, and that gave a decent improvement.

I wouldn't change default scale weights only based on 1 dataset. Also Full accuracy on tid13 is so high with these weights because of Exotic group (which is pretty much irrelevant).

Finally, according to tid2013 paper, SSIM is among the best metrics for "Good quality" (Table 6), so maybe add -ssim switch to calculate SSIM instead of MS-SSIM.

@igv
Copy link
Contributor Author

igv commented Dec 30, 2020

Afaik, this is how these scale weights calculated:
cs

cpd represent cycles per degree of visual angle which is determined by the viewing resolution factor.

@kornelski
Copy link
Owner

Yes, indeed I'm worried about overfitting for TID2013. Can you recommend other similar datasets that I could use to verify accuracy?

@igv
Copy link
Contributor Author

igv commented Dec 30, 2020

The most similar is KADID-10k. Only they made a mistake and placed reference images instead of first level distorted images for 3 distortion types (1st, 3rd and 8th).
The most popular are LIVE and CSIQ.
For some reason not very popular, but seems good - ESPL.

@kornelski
Copy link
Owner

With my max+avg pooling method my thinking was:

  1. if there's a small but significant distortion in the image (e.g. a single bad pixel), then it actually affects a larger area than itself, because it's likely to be spoiling an entire object/area it is in. This is why there's a "max" pooling step.

  2. But an image likely contains multiple objects, and one flaw is not disqualifying the whole image, which is why there's an averaging step too.

That solution is not mathematically elegant, but I like it, because it intuitively makes sense to me.

I've read how MAD works, but I'm not quite sure why this particular formula, especially with the "tweak", happens to work for pooling SSIM.

My hypothesis is:

  • For small n in pow(1/2^n) MAD is behaving like standard deviation, so it gives worse scores to images with multiple bad areas, but is insensitive to uniformly bad images.

  • For bigger n it makes avg approach 1, which makes MAD behave more like a simple average, which lets the other layers still catch uniformly bad images.

That makes me wonder whether this tweaked MAD just happens to be another way of mixing "worst" vs "average" error pooling, just not within each scale, but across scales.

And I wonder whether the proportion from pow(1/2^n) is ideal, or maybe it should be based on scale weights. This is something to experiment with.

@igv
Copy link
Contributor Author

igv commented Dec 31, 2020

It doesn't behave like std dev (I edited my comment above about MAD in MDSI) and it always makes avg approach 1 (that's the idea) except at 1st scale. 1/pow(1/2^n) is just a short for [1, 0.5, 0.25, 0.125, 0.0625][n], it has no relation to scale weights and impact on accuracy and ssim scores is minimal (vs for example [1.0, 0.5, 0.5, 0.5, 0.5][n]).

At first, after arithmetic mean, I tried geometric and harmonic means. They made accuracy worse (harmonic was the worst). From wikipedia: For all positive data sets containing at least one pair of nonequal values, the harmonic mean is always the least of the three means, while the arithmetic mean is always the greatest of the three and the geometric mean is always in between. So I added that exponent to bias avg towards good quality scores. The bigger downsampling factor the higher and less spread per-pixel ssim scores are (except for some Exotic distortion types probably), so the exponent value must be lower to be effective.

That makes me wonder whether this tweaked MAD just happens to be another way of mixing "worst" vs "average" error pooling, just not within each scale, but across scales.

It's still within each scale, not across scales (cross scale pooling is in this block).

And I wonder whether the proportion from pow(1/2^n) is ideal

Maybe not ideal, but I tried with different constants (like 1.5, 1.75, 2.25, 2.5) and also with inverse square law formula (1/n^2).

@igv
Copy link
Contributor Author

igv commented Jan 2, 2021

Tested on KADID-10k using scripts from here (only modified kadid10k.ROCC.py to show per type and per level accuracy).

1st is max+avg, 2nd is mad with non-custom scale weights.

Full SROCC:
-0.8525045482307988
-0.8562234657395806


Per type:

1:
-0.9482709952684674
-0.9524144401242203


2:
-0.888920125560901
-0.9036008877848952


3:
-0.9361601718850355
-0.942732720555493


4:
-0.9035452008410343
-0.9054443006074244


5:
-0.7505439463819739
-0.7721170099865738


6:
-0.877401725595138
-0.8779770694488326


7:
-0.6083002027988245
-0.5992010191419064


8:
-0.9202345152541264
-0.9192436523541861


9:
-0.9316652477020476
-0.9389697724114447


10:
-0.8725177456252896
-0.8922033461740347


11:
-0.8973088936601757
-0.8921109535136115


12:
-0.9312734837315821
-0.9300191542963452


13:
-0.8081662575202928
-0.8028804197128389


14:
-0.940544646168691
-0.9413759571962214


15:
-0.8984371485985785
-0.8985430184742703


16:
-0.9182612634610633
-0.9197175157873034


17:
-0.8913491941768678
-0.8932455426471694


18:
-0.7350325589908322
-0.7442556609995762


19:
-0.9281146136611702
-0.931734634444742


20:
-0.36244064860224723
-0.37055703484238806


21:
-0.4742315253111955
-0.48728508844891577


22:
-0.8585757413960478
-0.8672451350219674


23:
-0.5884619420813891
-0.5820224099228524


24:
-0.8857312844251738
-0.8900615092376575


25:
-0.75146916282513
-0.7677897943743471


Per level:

-0.7594226220260838
-0.7568424979839856


-0.7859239032568683
-0.7905589936285138


-0.7645387478162078
-0.7726706920108036


-0.6711213371339377
-0.6899222144730359


-0.598736695180533
-0.6224275615532653

kadid

@igv
Copy link
Contributor Author

igv commented Jan 2, 2021

kadid10k.ROCC.py:

import pandas as pd

df = pd.read_csv('kadid10k.dssim1.csv')
df2 = pd.read_csv('kadid10k.dssim2.csv')

print('Full SROCC:')
print(df[['dmos', 'DSSIM']].corr('spearman').loc["DSSIM"][0])
print(df2[['dmos', 'DSSIM']].corr('spearman').loc["DSSIM"][0])

print('\nFull KROCC:')
print(df[['dmos', 'DSSIM']].corr('kendall').loc["DSSIM"][0])
print(df2[['dmos', 'DSSIM']].corr('kendall').loc["DSSIM"][0])

print("\nPer type:")
for i in range(25):
    a=df['dist_img'].str.split("_", expand=True)[:][1].astype(int).isin([i+1])
    print("{}:".format(i+1))
    print(df[['dmos', 'DSSIM']][a].corr('spearman').loc["DSSIM"][0])
    print(df2[['dmos', 'DSSIM']][a].corr('spearman').loc["DSSIM"][0])
    print("\n------------------")

print("\nPer level:")
for i in range(5):
    a=df['dist_img'].str.split("_", expand=True)[:][2].str.split(".", expand=True)[:][0].astype(int).isin([i+1])
    print(df[['dmos', 'DSSIM']][a].corr('spearman').loc["DSSIM"][0])
    print(df2[['dmos', 'DSSIM']][a].corr('spearman').loc["DSSIM"][0])
    print("\n------------------")

@igv
Copy link
Contributor Author

igv commented Jan 6, 2021

Btw, there is my implementation of MDSI in Python. I removed it from my repo because I didn't like this metric very much. But maybe it's actually good?

@kornelski
Copy link
Owner

Thanks for the data. It looks good.

It makes DSSIM output scores that have a different magnitude than previously, so I'll probably add some rescaling fudge in order to avoid breaking users to much.

@igv
Copy link
Contributor Author

igv commented Jan 11, 2021

Thinking 5x5 binomial / Gaussian (std=1) down-sampling filter might work better with MAD than 2x2 avg.

@kornelski
Copy link
Owner

I guess it could help a little. IIRC change of 2x box blur to proper Gaussian kernel in SSIM blurring helped a little too.

@igv
Copy link
Contributor Author

igv commented Jan 26, 2021

Made a python version of this metric (grayscale only) for testing, with an alternative version of MAD.

import sys
from PIL import Image
import numpy as np
from scipy.ndimage import gaussian_filter

WEIGHTS = [0.0448]#, 0.2856, 0.3001, 0.2363, 0.1333]

def msssim(file1, file2):
    img1 = Image.open(file1).convert('RGB')
    img2 = Image.open(file2).convert('RGB')

    width, height = img1.size
    img1 = np.frombuffer(img1.tobytes(), dtype=np.uint8).reshape(height, width, 3) / 255
    img2 = np.frombuffer(img2.tobytes(), dtype=np.uint8).reshape(height, width, 3) / 255
    
    img1 = np.where(img1 > 0.04045, np.power((img1 + 0.055) / 1.055, 2.4),  img1 / 12.92)
    img2 = np.where(img2 > 0.04045, np.power((img2 + 0.055) / 1.055, 2.4),  img2 / 12.92)

    img1 = 0.2126 * img1[:,:,0] + 0.7152 * img1[:,:,1] + 0.0722 * img1[:,:,2]
    img2 = 0.2126 * img2[:,:,0] + 0.7152 * img2[:,:,1] + 0.0722 * img2[:,:,2]

    mssim = []
    for i in range(len(WEIGHTS)):
        mssim.append(ssim(pow(img1,1./2.2), pow(img2,1./2.2), i, i<len(WEIGHTS)-1))
        #img1 = avgpooling(img1, 2)
        #img2 = avgpooling(img2, 2)
        img1 = gaussian_filter(img1, 1.08, truncate=1.5)[::2,::2]
        img2 = gaussian_filter(img2, 1.08, truncate=1.5)[::2,::2]

    return np.sum(np.multiply(np.stack(mssim), WEIGHTS)) / np.sum(WEIGHTS)

def mad(x, l):
    return np.mean(np.absolute(x - np.power(np.mean(x), np.power(.5, l)))) # np.mean(np.absolute(x - np.mean(x if l==0 else np.sort(x, axis=None)[-int(x.size//1.5):])))

def ssim(L1, L2, lvl, cs_map):
    C1=(0.01)**2
    C2=(0.03)**2
    sd, t = 1.5, 3 #kernel radius = round(sd * truncate)

    mu1 = gaussian_filter(L1, sd, truncate=t)
    mu2 = gaussian_filter(L2, sd, truncate=t)
    mu1_sq = mu1 * mu1
    mu2_sq = mu2 * mu2
    mu1_mu2 = mu1 * mu2
    sigma1_sq = gaussian_filter(L1 * L1, sd, truncate=t) - mu1_sq
    sigma2_sq = gaussian_filter(L2 * L2, sd, truncate=t) - mu2_sq
    sigma12 = gaussian_filter(L1 * L2, sd, truncate=t) - mu1_mu2

    if cs_map:
        value = (2.0*sigma12 + C2)/(sigma1_sq + sigma2_sq + C2)
    else:
        value = ((2.0*mu1_mu2 + C1)*(2.0*sigma12 + C2))/((mu1_sq + mu2_sq + C1)*
                    (sigma1_sq + sigma2_sq + C2))

    return mad(value, lvl)

def avgpooling(mat, ksize):
    m, n = mat.shape[:2]
    ny = m // ksize
    nx = n // ksize
    mat_pad = mat[:ny*ksize, :nx*ksize, ...]
    new_shape = (ny, ksize, nx, ksize) + mat.shape[2:]
    result = np.nanmean(mat_pad.reshape(new_shape), axis=(1,3))
    return result

def main():
    for arg in sys.argv[2:]:
        score = msssim(sys.argv[1], arg)
        print(str(score) + "\t" + arg)

if __name__ == '__main__':
    main()

@kornelski kornelski merged commit 1a40e97 into kornelski:main Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants