Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra precision configure switch #449

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MrSmile
Copy link
Member

@MrSmile MrSmile commented Nov 1, 2020

This is my version of PR #329.

I've updated visual quality assessment tests (more tests, better reference target) and checked additional parameter values, so the table now looks the following:

LT div RP PP err 360 720 1080
16 8 2.691 25.756 35.071 44.928 old default
16 4 1.758 25.702 35.107 45.789 new default (no asm)
8 8 2.675 27.556 39.706 51.864
8 4 1.557 27.589 39.656 51.748
+ 16 8 2.680 28.269 37.872 47.693
+ 16 4 1.644 28.192 37.971 47.886
+ 16 2 1.533 28.002 37.462 47.024
+ 8 8 2.670 30.336 42.981 55.318
+ 8 4 1.497 30.488 42.790 55.233
+ 8 2 1.488 30.140 42.137 54.447 extra precision
+ 4 8 2.090 34.755 51.438 66.953
+ 4 4 1.473 35.042 52.274 67.621
+ 4 2 1.399 34.442 51.237 66.674
+ 20 8 2.395 22.647 30.737 39.008
+ 20 4 2.306 22.611 30.729 38.873
+ 16 8 2.358 22.747 31.262 40.384
+ 16 4 2.029 22.757 31.260 40.258 new default (asm)
+ 16 2 1.638 22.605 30.901 39.465
+ 8 8 2.270 24.488 35.683 46.670
+ 8 4 1.817 24.461 35.508 46.612
+ + 16 8 2.589 25.417 34.336 43.409
+ + 16 4 1.680 25.479 34.299 43.320
+ + 8 8 2.602 27.306 38.843 50.031
+ + 8 4 1.598 27.364 38.938 50.172

Legend:

  • LT = large tiles;
  • div = exact division;
  • RP = RASTERIZER_PRECISION;
  • PP = POSITION_PRECISION;
  • err = average error of some bunch of compare tests;
  • 360/720/1080 = time of rendering of my rasterizer-heavy clip at different screen sizes.

Large tiles are now enabled by default if assembly is present, as without good vector assembly it's slower instead.

Large tiles are now enabled in the default precision mode with assembly.
@astiob
Copy link
Member

astiob commented Dec 18, 2020

LT div RP PP err 360 720 1080  
+ 16 4 2.029 22.757 31.260 40.258 new default (asm)
+ 16 2 1.638 22.605 30.901 39.465  

Perhaps I’m misunderstanding something, but isn’t the latter option both more precise and faster across the board?

I trust you, but I have a few comments regarding scientific rigour:

In general, with any benchmark, it would be great to have the data and methodology public so that other people could (if they wanted) replicate the results on their own hardware and platforms, as well as better understand the impact and potentially spot methodology flaws and suggest improvements. Although admittedly, I doubt any of us actually has the time and desire to scrutinize this, as much as it would be a good practice.

Also, you suggest the results with and without assembler code are different, but the table shows only one set.

@MrSmile
Copy link
Member Author

MrSmile commented Dec 20, 2020

Perhaps I’m misunderstanding something, but isn’t the latter option both more precise and faster across the board?

That benchmark doesn't measure performance in relevant case (slow animation). On the over hand, simple math considerations say that for sufficiently slow animations performance should be proportional to POSITION_PRECISION, i. e. PP = 2 is two times slower than PP = 4. And relevant precision difference is invisible for the naked eye (PP = 4 equals 1/16 pixel).

I trust you, but I have a few comments regarding scientific rigour

This result is not of the scientific quality, I've grabbed some tests laying around and pasted it together. This is also why I don't want to make too drastic changes based on it. But I can show test sources through IRC for anyone interested.

Also, you suggest the results with and without assembler code are different, but the table shows only one set.

I don't have relevant hardware for such tests (ARM-based or some outdated 32-bit atom) so that decision is mostly guesswork. I've done some tests with disabled assembly and lowered architecture target but it's even less relevant than other tests.

@TheOneric
Copy link
Member

To verify the new defaults against some more scenarios I did run the error tests against these regression test. The reference images the errors are calculated against were create based on current master 5c30976, with --disable-asm and the following changes:

  • line-scale in ass_rasterizer.c was set to ((uint64_t) 1 << 61) / max_ab
  • in ass_render.c these Macros were redefined to a new value:
    • RASTERIZER_PRECISION to 1
    • POSITION_PRECISION to 1.0
    • SUBPIXEL_ORDER to 6

bord_tiny and bord_normal are derived from the SBAS default test, where bord_tiny are the tests with SBAS disabled while in bord_normal SBAS is enabled. The other testgroups are named just like their dirs.

I only did precision tests, no runtime tests as getting good results for those short tests with relevant run-by-run variance would require much more work.

First here are the results for unmodified master:

With x86_64 ASM:

#testgroup      stddev  arith-mean
zero-area   	  -  	0.870
karaoke     	0.180	3.606
implicit-pos	0.138	0.311
bord_normal 	  -  	12.000
draw-images 	  -  	2.223
bord_tiny   	  -  	1.983

Without ASM:

zero-area   	  -  	0.870
karaoke     	0.180	3.606
implicit-pos	0.127	0.300
bord_normal 	  -  	10.000
draw-images 	  -  	2.223
bord_tiny   	  -  	1.983

Now the new defaults as per this pr's 5f12e9c:

New x86_64 ASM default:

zero-area   	  -  	0.364
karaoke     	0.464	4.572
implicit-pos	0.132	0.271
bord_normal 	  -  	10.000
draw-images 	  -  	1.092
bord_tiny   	  -  	1.200

New default with --disable-asm:

zero-area   	  -  	0.241
karaoke     	0.610	4.791
implicit-pos	0.110	0.250
bord_normal 	  -  	8.000
draw-images 	  -  	1.092
bord_tiny   	  -  	1.126

Extra precision with --disable-asm:

zero-area   	  -  	0.280
karaoke     	0.222	3.540
implicit-pos	0.092	0.209
bord_normal 	  -  	3.202
draw-images 	  -  	0.000
bord_tiny   	  -  	0.754

In most scenarios the new defaults do decrease deviation from the reference images, but the karaoke tests are a notable exception. This is not necessarily due to karaoke-effect itself, as there are also blurs and borders in the current version of the karaoke tests.

Here are the full compare results:
errlogs.zip. To accurately reproduce the logs or to process them with scripts afterwards, it is best to first patch compare to print libass-messages to stderr instead of stdout.


As a side note, the image difference between ASM and non-ASM configs might cause some additional work for our future test suite (#108), as we might end up needing multiple sets of reference images, one without ASM and one more per supported CPU-architecture.
Even on current master the x86_64-ASM and non-ASM difference causes some tests to be marked as BAD if not run with the same config.
Non-ASM compared against x86_64-ASM on master:

zero-area   	  -  	0.052
karaoke     	1.854	1.094
implicit-pos	0.012	0.036
bord_normal 	  -  	0.348
draw-images 	  -  	0.000
bord_tiny   	  -  	0.105

@MrSmile
Copy link
Member Author

MrSmile commented Dec 21, 2020

In most scenarios the new defaults do decrease deviation from the reference images, but the karaoke tests are a notable exception.

Looks like karaoke and bord_normal tests have some rendering problem unrelated to precision constants. At least it's known that \be is not scale invariant, there could be similar problems in other cases too (rounding issues, inaccurate blending).

Even on current master the x86_64-ASM and non-ASM difference causes some tests to be marked as BAD if not run with the same config.

I think I checked that my assembly output is bit-identical to corresponding C code output. I wonder where is that difference arises.

@TheOneric
Copy link
Member

Looks like karaoke and bord_normal tests have some rendering problem unrelated to precision constants. At least it's known that \be is not scale invariant, there could be similar problems in other cases too (rounding issues, inaccurate blending).

Both bord_* tests are justs static text with a border, no shadow, just the border size differs. Karaoke test 357_… is also just text+border with some \k and \kf-tags, while karaoke test 216_vertical has text+border+rotation+kf+fad+blur (\blur not \be).
bord_* is rendered at a frame larger than script resolution, whilekaraoke/216_vertical is rendered at a smaller frame and karaoke/357_… is rendered at script resolution. compare's scale parameter is not used.

@TheOneric
Copy link
Member

TheOneric commented May 24, 2021

Now that the C-ASM differences has been dealt with, let's take a new look at this.
I used ef76434067d435fa83bd367b0b4b51d04005bb1c as a basis. As before, I generated references images for ART samples by setting:

  • line->scale to ((uint64_t) 1 << 61) / max_ab
  • RASTERIZER_PRECISION to 1
  • POSITION_PRECISION to 1.0
  • SUBPIXEL_ORDER to 6
    and measured the deviation for current and proposed settings (always with ASM enabled).
    I did not do any performance measurements.

Full logs and GNU Awk script used for generating these tables attached: 449_logs_20210524.zip
Note that the karaoke-tests 216 and 357 have relative high deviations on all settings (also include border and in case of 216 blur).


Current Default

    test-sample   avg.deviation  stddev
---------------------------------------
    small-border    1.983       -
   normal-border    10.000      -
 bordercollision    0.364       -
       shift-dir    0.294   0.125542
           efont    0.272       -
             216    3.877       -
             357    3.516       -
         drawImg    1.120       -
       zero-area    0.870       -

Current --enable-large-tiles

    small-border    2.084       -
   normal-border    7.000       -
 bordercollision    0.412       -
       shift-dir    0.337   0.174349
           efont    0.314       -
             216    3.877       -
             357    3.516       -
         drawImg    1.120       -
       zero-area    0.870       -

Note, that for some reason, this has a lower deviation for normal-border tha the default.


Proposed default for non-ASM:

    small-border    1.126       -
   normal-border    8.000       -
 bordercollision    0.364       -
       shift-dir    0.239   0.103569
           efont    0.104       -
             216    3.877       -
             357    5.096       -
         drawImg    1.120       -
       zero-area    0.241       -

Significantly bigger deviation for 357 than currently.

Proposed default for ASM:

    small-border    1.200       -
   normal-border    8.000       -
 bordercollision    0.412       -
       shift-dir    0.257   0.123622
           efont    0.104       -
             216    3.877       -
             357    4.804       -
         drawImg    1.120       -
       zero-area    0.364       -

Sometimes more, sometimes less deviation than no-ASM.
Notably, this has a greater deviation than the current default for 357 and bordercollision, though the latter only
is a relatively small differences.

Proposed --enable-extra-precision:

    small-border    0.754       -
   normal-border    3.202       -
 bordercollision    0.159       -
       shift-dir    0.214   0.0907621
           efont    0.084       -
             216    3.872       -
             357    3.429       -
         drawImg    0.000       -
       zero-area    0.280       -

@TheOneric
Copy link
Member

I'm not sure about removing the "lower quality, but faster" large-tiles setting without a replacement; it is eg used by some of the JSO forks.
We could instead use a non-boolean precision config setting with eg default as the default which aims to balance accuracy and speed with a bias towards accuracy, high for high accuracy at the cost of speed and low which favours speeds as long as the accuracy doesn't get too bad.

@MrSmile
Copy link
Member Author

MrSmile commented May 26, 2021

I generated references images for ART samples by setting

You should use -s 8 to mitigate aliasing errors, I think that karaoke differences is due to it. There's a problem with not scale-invariant \be, but it's a known bug without a good solution.

I'm not sure about removing the "lower quality, but faster" large-tiles setting without a replacement; it is eg used by some of the JSO forks.

large-tiles is a trade-off between complex logic and streamlined branchless brute-force code. Usually brute-force approach isn't faster without sufficiently effective SIMD—hand-coded assembly or at least intrinsics. I suspect that using it for JS will be slower instead.

@TheOneric
Copy link
Member

large-tiles is a trade-off between complex logic and streamlined branchless brute-force code. Usually brute-force approach isn't faster without sufficiently effective SIMD—hand-coded assembly or at least intrinsics.

Right, the current description makes it sound like a flat improvement though. I think there may be some use for a real “lower quality, but faster” setting for some use-cases; independent of whether large-tiles stays with an updated description or gets removed. (as it is implemented now, the only way to get the same result from asm and no-asm builds is to use extra-precision which won't be a common config; I'm not sure if requiring extra-precision is a potential problem or not for our regression tests)

You're results in the original post suggest there is a potential time reduction of up to 13% with lowered precision settings, given that you weren't looking to actually lower the quality in those tests, this percentage may increase further with different but for this use-case still acceptable settings. But I also don't now how real-worldy the test sample was.

I think that karaoke differences is due to [aliasing errors]

This made me remember that one of the karaoke tests was quite affected by the fix from #483. Perhaps it's better to wait for #483 to be merged before remeasuring this.

@TheOneric TheOneric added this to the 0.17.0 milestone Sep 17, 2022
@TheOneric TheOneric modified the milestones: 0.17.0, 0.17.1 Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants