autotest results are different on 64 and 32 bit platforms #4643

Closed
tbonfort opened this Issue Apr 30, 2013 · 14 comments

Projects

None yet

3 participants

@tbonfort
Member

... making it difficult to run the tests without failure on different platforms. I suspect some of these differences are due to different rounding methods.

This is a placeholder ticket to track commits related to harmonizing output w.r.t. platforms

@tbonfort tbonfort was assigned Apr 30, 2013
@tbonfort
Member

use lrint for MS_NINT where available, and fixes for vertical bar chart heights are in 03eff67

@tbonfort tbonfort added a commit to tbonfort/mapserver that referenced this issue Apr 30, 2013
@tbonfort tbonfort don't round points when going from geo to pixel (#4643) 288dd5c
@tbonfort tbonfort added a commit to tbonfort/msautotest that referenced this issue Apr 30, 2013
@tbonfort tbonfort modify tests for mapserver/mapserver#4643 8c6ce84
@tbonfort tbonfort added a commit that referenced this issue May 6, 2013
@tbonfort tbonfort restore CI test suite on travis-ci (#4643)
don't run ithe php test suite, the pear dependency is too
slow for travis-ci builds.
d9760bb
@tbonfort tbonfort added a commit to mapserver/msautotest_DEPRECATED that referenced this issue May 6, 2013
@tbonfort tbonfort update for 64bit 12.04 on travis fff3a03
@tbonfort
Member
tbonfort commented May 6, 2013

autotests updated to use expected results from a 64bit ubuntu 12.04 build (which matches the travis-ci setup).
The differences between 32 and 64 bit builds are not something we can act on at 100%, but as 64bits is the norm nowadays it's more reasonable to switch to 64bit expected results.

@tbonfort tbonfort closed this May 6, 2013
@szekerest
Member

What about the differences between Windows/Linux platforms? Currently the expected results are mostly generated from Linux which causes some tests to fail permanently on Windows.

@tbonfort
Member
tbonfort commented May 6, 2013

The tests are already different from one compiler to the other, one arch to the other, one freetype version to the other, etc... If you have a solution for that, and if you want to add windows to the equation, please do

@szekerest
Member

Adding the expected results from Windows would probably break tests for the other platforms. The only solution I can imagine is to store the results side by side for each compiler/architecture etc. That would probably require to restructure the autotest suite which would require some consensus I think.

@rouault
Contributor
rouault commented May 6, 2013

This is an interesting issue that we encounter from time to time in the GDAL autotest suite. IMHO maintaining a set of files for each test and compiler/architecture will be a maintenance nightmare since in the case you need to change a test, you will have to retest on all platforms to get the new expected result... And how would you caracterize a platform : CPU architecture (Intel 32, Intel 64, ARM...), compiler version, compiler flags, ... A lot of combinations !
The theoretical solution would be to make comparisons less stricter :

  • for images, have a perceptualdiff version that uses a greater tolerance value (I have seen failed tests in the msautotest where perceptualdiff complained, but my eyes didn't notice any difference),
  • for text files avoid doing strict text comparison (for example you could preprocess the text files and look for numeric values in them, and round them to an appropriate precision before proceding to the comparison)
@szekerest
Member

I think not all combinations are to be maintained. For example with MSVC compiler versions most results looks pretty much the same but many of them visually different from Linux (gcc?) variant. We might probably just establish a mechanism to provide alternatives to a single file (like misc/expected/quantized.png misc/expected/quantized.png.2) and consider the success of the test if at least one of them matches with the result.

@tbonfort
Member
tbonfort commented May 6, 2013

+1 on not adding compiler/platform/lib-version specific tests as their maintenance is a nightmare. An unmaintained test suite is a test suite that is not run regularly as it will fail one day, and we then find ourself in the situation we were in a year ago with 50% of the tests failing, some with a valid reason and some without.

As for perceptualdiff, there's the option to run the test suite with it, it is just not activated on the CI builds. In addition to slowing down the testsuite run, it is also too strict and too relaxed in my opinion:

  • Depending on the platform, some rounding will occur either up or down, which results in a one pixel discrepancy between the expected and result. In my experience perceptualdiff will (rightfully) mark these images as non matching, i.e. it is too strict for this use-case
  • On the other hand, a one pixel difference that is not related to a platform change should show up as failed.
@tbonfort
Member
tbonfort commented May 6, 2013

@szekerest (our messages crossed): the problem with your proposal is that only you can/will update the windows autotest expected results.

@szekerest
Member

@tbonfort Yes, but that would be much better than the current solution which makes msautotest fairly unusable on Windows. I think the results don't change so frequently which would make this maintenance difficult.
BTW: I also use perceptualdiff at http://www.gisinternals.com/sdk/ but that doesn't really help to solve this problem. (See 'show all results' on the msautotest pages to make sure where perceptualdiff could help)

@tbonfort
Member
tbonfort commented May 6, 2013

@szekerest I don't think it is that difficult to add a platform specific test to the python testsuite (i.e. look for a MS_PLATFORM environment variable, and then compare against expected/$file.MS_PLATFORM if that one exists). As long as you are creating and maintaining it, and that it does not break the automated CI tests, I would say that you are good to go...

@szekerest
Member

@tbonfort Using specific environment settings to make the tests run successfully doesn't seem to be a right solution. If I provide a test result on Windows I'd expect the other windows users to utilize that automatically without requiring to do extra set up. I'd somewhat prefer to just provide alternatives to a test result without binding that to a specific platform/architecture. For example if a test result is the same for all MSVC compilers I'd just require to add one file in addition to the Linux variant. But if we get different results per architecture (Win32/Win64) I'd require to add 2 additional file to the expected results.
Does that make sense?

@tbonfort
Member
tbonfort commented May 6, 2013

Just to make we're talking about the same solution, you'd have expected/$file, expected/$file.1, expected/$file.2 , and if expected/$file does not match, we loop through the $file.1 $file.2 ?

I'd be +0 on that. I don't think it's difficult to add that to the python files.

@szekerest
Member

I think we'll require to move to the -dev list to make sure how the others would like this or have a better solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment