-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate butteraugli on the CLIC-2021 perceptual quality task #202
Comments
FWIW I get a 403 - Forbidden on that last URL. Rgds Damon |
That URL is not public as of yet (today). Should be public starting Monday. For now you can see: The important bit are the instructions at: |
For clarity: I am only an ignorant lurker, and not one of the fine people actually making JXL happen! B^> Rgds Damon |
If someone wants to do this, here's a simple bash script to produce the desired data. You'll have to download about 50 GB of images though, and it'll take quite a long time to compute all the scores.
I tried it on the smaller example validation.csv set and got an accuracy of 0.595 for ssimulacra and 0.671 for Butteraugli 3-norm. I haven't tried it on the larger test set yet though. @gtoderici I think it would be interesting not to just look at overall accuracy, but also accuracy for various operating points, e.g. if some of the stimuli correspond to low bpp and others to higher bpp encoding, maybe segment the data accordingly into a few buckets (e.g. six buckets: low vs low, low vs medium, low vs high, medium vs medium, medium vs high, high vs high) and compute accuracies per bucket. I expect it to be the case that some metrics are better at some of those tasks but worse at others, and this would be very useful information (more so than "which does best overall", since afaik we don't really have any metric that really does great overall). |
@jonsneyers - I agree with your assessment about evaluating the accuracy (and more) at various bitrates. The evaluation code already does it, but I haven't had time to do any of the graphing work required. However, I have only done this on the test set thus far since we have more human ratings there. I noticed that for all metrics there exists some discrepancy in performance between validation and test, but not by much. However, what is more interesting to me is that if you look at ranking results, things can vary quite a bit between perceptual quality methods, despite having similar accuracy. Once I get the butteraugli CSV file for the test set, I will ping this thread. |
Let's try 6 norm of butteraugli. I often use a lower norm that what actually works psychovisually because it is easier to optimize for. Here, we don't get the benefits for using a lower norm since this is pure psychovisuals, no optimization involved. |
Is your feature request related to a problem? Please describe.
It would be good to evaluate butteraugli on the CLIC-2021 perceptual quality task. This should provide additional information to he community with respect to its performance characteristics when compared to other potentially usable perceptual quality metrics that JPEG XL could optimize for.
Describe the solution you'd like
Please use the test data from the CLIC 2021 perceptual challenge to generate a CSV file with the decisions (see link below for the exact instructions):
https://github.com/fab-jul/clic2021-devkit/blob/main/README.md#perceptual-challenge
Please email them to me to get the final results / ranks. We'll publish these at: http://compression.cc/leaderboard/perceptual/test/
...and of course update this bug tracker.
The text was updated successfully, but these errors were encountered: