Increase tolerance for alternate architectures #17800

QuLogic · 2020-06-29T23:35:13Z

PR Summary

Most of the differences are just adding the other architectures alongside aarch64.

The other change is a blanket increase of tolerance on 32-bit systems. There are too many failures, and the RMS is so small, that I don't think there's any need to investigate much further.

This should pass tests on all Fedora-supported architectures (x86_64, i686, armv7hl, aarch64, ppc64le, and s390x), though we do not currently test SVG output.

PR Checklist

Has Pytest style unit tests
Code is Flake 8 compliant
[N/A] New features are documented, with examples if plot related
[N/A] Documentation is sphinx and numpydoc compliant
[N/A] Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
[N/A] Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

tacaswell · 2020-06-29T23:43:51Z

I hope that most of the differences are in small anti-aliasing artifacts so I would expect the svg tests (which do the rasterization locally) should be a bit better than the png tests?

lib/matplotlib/testing/decorators.py

timhoffm · 2020-07-01T13:53:33Z

For my understanding:

Is there an obvious reason that exactly aarch64, ppc64le and s390x need increased tolerances?
Did you check that one needs the same tolerance increment for all the architectures or did you just copy the value from aarch64 and it worked?

tacaswell · 2020-07-01T15:34:37Z

When we have dug into these issues in detail in the past they have either been true broken (because we were doing nasty things an the c-level that assumed byte order and I think we have fixed all of those) or issues with very small variations in float values.

In the past the differences have been single bit flips in a single channel in ~dozen pixels in the image. We are essentially binning full float precision down to 256 (or less) bins as the final step of the rendering (because we save 32bit RGBA images). In most cases changes at the limits of float precision don't really matter (because it all goes down to the same bin), but if we get unlucky a tiny change can flip which side of a boundary the value is on and increase / decrease the value in the in the final output by 1 (which then massively magnifies the change).

timhoffm · 2020-07-02T15:33:32Z

I was essentially wondering if it makes sense to pull out the architectures and not define the same machine lookup at every place individually: i.e. instead of

@image_comparison(['fancyarrow_dpi_cor_100dpi.png'], remove_text=True,
                  tol={'aarch64': 0.02, 'ppc64le': 0.02,
                       's390x': 0.02}.get(platform.machine(), 0),

use

                  tol=0.02 if platform.machine() in EXTRA_TOL_MACHINES else 0),

or

                  tol=_add_extra_tol(0.02),

with

def _add_extra_tol(extra_tol, default=0):
    if platform.machine() in ['aarch64', 'ppc64le', 's390x']:
        return default + extra_tol
    else:
        return default

For this to make sense, there should be some comment why the extra tolerance is added and I don't know how to describe it. Maybe it's just "machines that empirically need larger tolerances".

QuLogic · 2020-07-02T23:57:29Z

These are values based on actual results, though I did not check all the results directly as I have no access to all those systems. I did set the values to match aarch64, but only in the cases where the RMS difference was already the same. Note, there's at least one case where aarch64 has a special tolerance that the other ones here don't.

Looking a little closer, I think it's actually almost all != x86_64, except that the 32-bit systems are already covered by the additional 0.06, so we could flip the check the other way (0 if x86_64, something else otherwise).

tacaswell · 2020-07-03T02:00:28Z

I'm a not clear if @timhoffm is advocating for changing the default threshold on all tests on non-x86_64 or just inverting the logic on the handlful that need it.

I may have over-learned worrying about non-0 tolerances (we used to have a blanket tolerance and then had text tests rendering the wrong glyphs but still come in under the threshold), but I think there is value in only tweaking the tests that need it (presumably because we for what ever reason we get unlucky on them) rather than on all tests.

timhoffm · 2020-07-04T23:21:56Z

I'm a not clear if @timhoffm is advocating for changing the default threshold on all tests on non-x86_64 or just inverting the logic on the handlful that need it.

That's not exactly my point. I was just seeing a lot

                  tol={'aarch64': val, 'ppc64le': val,
                       's390x': val}.get(platform.machine(), 0),

where the same val (per case) was always applied to the same 3 platforms. This seems to be a common pattern that could be factored out. The functionality can stay exactly as it is now, the only question is if we want to factor it out.

tacaswell · 2020-07-05T19:36:26Z

Sorry, I mis-understood you @timhoffm .

Around 900 tests fail on 32-bit systems, but with rather small RMS.

These are mostly the same as the aarch64 tolerances, and also fail the same way on 32-bit systems (though this is hidden by the previous commit), so just increase tolerance everywhere but x86_64.

QuLogic · 2020-07-09T08:15:04Z

I flipped these all to tol=0 if platform.machine() == 'x86_64' else 'something else'.

timhoffm · 2020-07-09T11:28:01Z

Let's wait and give @tacaswell a final say.

…ectures

…800-on-v3.3.x Backport PR #17800 on branch v3.3.x (Increase tolerance for alternate architectures)

QuLogic added the topic: testing label Jun 29, 2020

QuLogic added this to the v3.3.0 milestone Jun 29, 2020

tacaswell approved these changes Jun 29, 2020

View reviewed changes

timhoffm reviewed Jun 30, 2020

View reviewed changes

lib/matplotlib/testing/decorators.py Show resolved Hide resolved

QuLogic force-pushed the altarch branch from e671117 to c2c0e34 Compare June 30, 2020 22:50

QuLogic added 3 commits July 9, 2020 04:06

Increase default tolerance slightly for 32-bit systems.

e7dfc22

Around 900 tests fail on 32-bit systems, but with rather small RMS.

Update aarch64 tolerances.

ed8237f

Increase tolerance for ppc64le and s390x.

7deb36d

These are mostly the same as the aarch64 tolerances, and also fail the same way on 32-bit systems (though this is hidden by the previous commit), so just increase tolerance everywhere but x86_64.

QuLogic force-pushed the altarch branch from c2c0e34 to 7deb36d Compare July 9, 2020 08:14

timhoffm approved these changes Jul 9, 2020

View reviewed changes

tacaswell merged commit 9e519eb into matplotlib:master Jul 10, 2020

meeseeksmachine pushed a commit to meeseeksmachine/matplotlib that referenced this pull request Jul 10, 2020

Backport PR matplotlib#17800: Increase tolerance for alternate archit…

d422bd8

…ectures

meeseeksmachine mentioned this pull request Jul 10, 2020

Backport PR #17800 on branch v3.3.x (Increase tolerance for alternate architectures) #17872

Merged

QuLogic deleted the altarch branch July 10, 2020 03:45

QuLogic added a commit that referenced this pull request Jul 10, 2020

Merge pull request #17872 from meeseeksmachine/auto-backport-of-pr-17…

c1d3210

…800-on-v3.3.x Backport PR #17800 on branch v3.3.x (Increase tolerance for alternate architectures)

QuLogic mentioned this pull request Jun 3, 2021

Image comparison errors on Linux 32-bit #6604

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase tolerance for alternate architectures #17800

Increase tolerance for alternate architectures #17800

QuLogic commented Jun 29, 2020

tacaswell commented Jun 29, 2020

timhoffm commented Jul 1, 2020 •

edited

tacaswell commented Jul 1, 2020

timhoffm commented Jul 2, 2020

QuLogic commented Jul 2, 2020

tacaswell commented Jul 3, 2020

timhoffm commented Jul 4, 2020

tacaswell commented Jul 5, 2020

QuLogic commented Jul 9, 2020

timhoffm commented Jul 9, 2020

Increase tolerance for alternate architectures #17800

Increase tolerance for alternate architectures #17800

Conversation

QuLogic commented Jun 29, 2020

PR Summary

PR Checklist

tacaswell commented Jun 29, 2020

timhoffm commented Jul 1, 2020 • edited

tacaswell commented Jul 1, 2020

timhoffm commented Jul 2, 2020

QuLogic commented Jul 2, 2020

tacaswell commented Jul 3, 2020

timhoffm commented Jul 4, 2020

tacaswell commented Jul 5, 2020

QuLogic commented Jul 9, 2020

timhoffm commented Jul 9, 2020

timhoffm commented Jul 1, 2020 •

edited