New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autotest results are different on 64 and 32 bit platforms #4643

Closed
tbonfort opened this Issue Apr 30, 2013 · 14 comments

Comments

Projects
None yet
3 participants
@tbonfort
Member

tbonfort commented Apr 30, 2013

... making it difficult to run the tests without failure on different platforms. I suspect some of these differences are due to different rounding methods.

This is a placeholder ticket to track commits related to harmonizing output w.r.t. platforms

@ghost ghost assigned tbonfort Apr 30, 2013

@tbonfort

This comment has been minimized.

Show comment
Hide comment
@tbonfort

tbonfort Apr 30, 2013

Member

use lrint for MS_NINT where available, and fixes for vertical bar chart heights are in 03eff67

Member

tbonfort commented Apr 30, 2013

use lrint for MS_NINT where available, and fixes for vertical bar chart heights are in 03eff67

tbonfort added a commit to tbonfort/mapserver that referenced this issue Apr 30, 2013

tbonfort added a commit to tbonfort/msautotest that referenced this issue Apr 30, 2013

tbonfort added a commit that referenced this issue May 6, 2013

restore CI test suite on travis-ci (#4643)
don't run ithe php test suite, the pear dependency is too
slow for travis-ci builds.

tbonfort added a commit to mapserver/msautotest_DEPRECATED that referenced this issue May 6, 2013

@tbonfort

This comment has been minimized.

Show comment
Hide comment
@tbonfort

tbonfort May 6, 2013

Member

autotests updated to use expected results from a 64bit ubuntu 12.04 build (which matches the travis-ci setup).
The differences between 32 and 64 bit builds are not something we can act on at 100%, but as 64bits is the norm nowadays it's more reasonable to switch to 64bit expected results.

Member

tbonfort commented May 6, 2013

autotests updated to use expected results from a 64bit ubuntu 12.04 build (which matches the travis-ci setup).
The differences between 32 and 64 bit builds are not something we can act on at 100%, but as 64bits is the norm nowadays it's more reasonable to switch to 64bit expected results.

@tbonfort tbonfort closed this May 6, 2013

@szekerest

This comment has been minimized.

Show comment
Hide comment
@szekerest

szekerest May 6, 2013

Member

What about the differences between Windows/Linux platforms? Currently the expected results are mostly generated from Linux which causes some tests to fail permanently on Windows.

Member

szekerest commented May 6, 2013

What about the differences between Windows/Linux platforms? Currently the expected results are mostly generated from Linux which causes some tests to fail permanently on Windows.

@tbonfort

This comment has been minimized.

Show comment
Hide comment
@tbonfort

tbonfort May 6, 2013

Member

The tests are already different from one compiler to the other, one arch to the other, one freetype version to the other, etc... If you have a solution for that, and if you want to add windows to the equation, please do

Member

tbonfort commented May 6, 2013

The tests are already different from one compiler to the other, one arch to the other, one freetype version to the other, etc... If you have a solution for that, and if you want to add windows to the equation, please do

@szekerest

This comment has been minimized.

Show comment
Hide comment
@szekerest

szekerest May 6, 2013

Member

Adding the expected results from Windows would probably break tests for the other platforms. The only solution I can imagine is to store the results side by side for each compiler/architecture etc. That would probably require to restructure the autotest suite which would require some consensus I think.

Member

szekerest commented May 6, 2013

Adding the expected results from Windows would probably break tests for the other platforms. The only solution I can imagine is to store the results side by side for each compiler/architecture etc. That would probably require to restructure the autotest suite which would require some consensus I think.

@rouault

This comment has been minimized.

Show comment
Hide comment
@rouault

rouault May 6, 2013

Contributor

This is an interesting issue that we encounter from time to time in the GDAL autotest suite. IMHO maintaining a set of files for each test and compiler/architecture will be a maintenance nightmare since in the case you need to change a test, you will have to retest on all platforms to get the new expected result... And how would you caracterize a platform : CPU architecture (Intel 32, Intel 64, ARM...), compiler version, compiler flags, ... A lot of combinations !
The theoretical solution would be to make comparisons less stricter :

  • for images, have a perceptualdiff version that uses a greater tolerance value (I have seen failed tests in the msautotest where perceptualdiff complained, but my eyes didn't notice any difference),
  • for text files avoid doing strict text comparison (for example you could preprocess the text files and look for numeric values in them, and round them to an appropriate precision before proceding to the comparison)
Contributor

rouault commented May 6, 2013

This is an interesting issue that we encounter from time to time in the GDAL autotest suite. IMHO maintaining a set of files for each test and compiler/architecture will be a maintenance nightmare since in the case you need to change a test, you will have to retest on all platforms to get the new expected result... And how would you caracterize a platform : CPU architecture (Intel 32, Intel 64, ARM...), compiler version, compiler flags, ... A lot of combinations !
The theoretical solution would be to make comparisons less stricter :

  • for images, have a perceptualdiff version that uses a greater tolerance value (I have seen failed tests in the msautotest where perceptualdiff complained, but my eyes didn't notice any difference),
  • for text files avoid doing strict text comparison (for example you could preprocess the text files and look for numeric values in them, and round them to an appropriate precision before proceding to the comparison)
@szekerest

This comment has been minimized.

Show comment
Hide comment
@szekerest

szekerest May 6, 2013

Member

I think not all combinations are to be maintained. For example with MSVC compiler versions most results looks pretty much the same but many of them visually different from Linux (gcc?) variant. We might probably just establish a mechanism to provide alternatives to a single file (like misc/expected/quantized.png misc/expected/quantized.png.2) and consider the success of the test if at least one of them matches with the result.

Member

szekerest commented May 6, 2013

I think not all combinations are to be maintained. For example with MSVC compiler versions most results looks pretty much the same but many of them visually different from Linux (gcc?) variant. We might probably just establish a mechanism to provide alternatives to a single file (like misc/expected/quantized.png misc/expected/quantized.png.2) and consider the success of the test if at least one of them matches with the result.

@tbonfort

This comment has been minimized.

Show comment
Hide comment
@tbonfort

tbonfort May 6, 2013

Member

+1 on not adding compiler/platform/lib-version specific tests as their maintenance is a nightmare. An unmaintained test suite is a test suite that is not run regularly as it will fail one day, and we then find ourself in the situation we were in a year ago with 50% of the tests failing, some with a valid reason and some without.

As for perceptualdiff, there's the option to run the test suite with it, it is just not activated on the CI builds. In addition to slowing down the testsuite run, it is also too strict and too relaxed in my opinion:

  • Depending on the platform, some rounding will occur either up or down, which results in a one pixel discrepancy between the expected and result. In my experience perceptualdiff will (rightfully) mark these images as non matching, i.e. it is too strict for this use-case
  • On the other hand, a one pixel difference that is not related to a platform change should show up as failed.
Member

tbonfort commented May 6, 2013

+1 on not adding compiler/platform/lib-version specific tests as their maintenance is a nightmare. An unmaintained test suite is a test suite that is not run regularly as it will fail one day, and we then find ourself in the situation we were in a year ago with 50% of the tests failing, some with a valid reason and some without.

As for perceptualdiff, there's the option to run the test suite with it, it is just not activated on the CI builds. In addition to slowing down the testsuite run, it is also too strict and too relaxed in my opinion:

  • Depending on the platform, some rounding will occur either up or down, which results in a one pixel discrepancy between the expected and result. In my experience perceptualdiff will (rightfully) mark these images as non matching, i.e. it is too strict for this use-case
  • On the other hand, a one pixel difference that is not related to a platform change should show up as failed.
@tbonfort

This comment has been minimized.

Show comment
Hide comment
@tbonfort

tbonfort May 6, 2013

Member

@szekerest (our messages crossed): the problem with your proposal is that only you can/will update the windows autotest expected results.

Member

tbonfort commented May 6, 2013

@szekerest (our messages crossed): the problem with your proposal is that only you can/will update the windows autotest expected results.

@szekerest

This comment has been minimized.

Show comment
Hide comment
@szekerest

szekerest May 6, 2013

Member

@tbonfort Yes, but that would be much better than the current solution which makes msautotest fairly unusable on Windows. I think the results don't change so frequently which would make this maintenance difficult.
BTW: I also use perceptualdiff at http://www.gisinternals.com/sdk/ but that doesn't really help to solve this problem. (See 'show all results' on the msautotest pages to make sure where perceptualdiff could help)

Member

szekerest commented May 6, 2013

@tbonfort Yes, but that would be much better than the current solution which makes msautotest fairly unusable on Windows. I think the results don't change so frequently which would make this maintenance difficult.
BTW: I also use perceptualdiff at http://www.gisinternals.com/sdk/ but that doesn't really help to solve this problem. (See 'show all results' on the msautotest pages to make sure where perceptualdiff could help)

@tbonfort

This comment has been minimized.

Show comment
Hide comment
@tbonfort

tbonfort May 6, 2013

Member

@szekerest I don't think it is that difficult to add a platform specific test to the python testsuite (i.e. look for a MS_PLATFORM environment variable, and then compare against expected/$file.MS_PLATFORM if that one exists). As long as you are creating and maintaining it, and that it does not break the automated CI tests, I would say that you are good to go...

Member

tbonfort commented May 6, 2013

@szekerest I don't think it is that difficult to add a platform specific test to the python testsuite (i.e. look for a MS_PLATFORM environment variable, and then compare against expected/$file.MS_PLATFORM if that one exists). As long as you are creating and maintaining it, and that it does not break the automated CI tests, I would say that you are good to go...

@szekerest

This comment has been minimized.

Show comment
Hide comment
@szekerest

szekerest May 6, 2013

Member

@tbonfort Using specific environment settings to make the tests run successfully doesn't seem to be a right solution. If I provide a test result on Windows I'd expect the other windows users to utilize that automatically without requiring to do extra set up. I'd somewhat prefer to just provide alternatives to a test result without binding that to a specific platform/architecture. For example if a test result is the same for all MSVC compilers I'd just require to add one file in addition to the Linux variant. But if we get different results per architecture (Win32/Win64) I'd require to add 2 additional file to the expected results.
Does that make sense?

Member

szekerest commented May 6, 2013

@tbonfort Using specific environment settings to make the tests run successfully doesn't seem to be a right solution. If I provide a test result on Windows I'd expect the other windows users to utilize that automatically without requiring to do extra set up. I'd somewhat prefer to just provide alternatives to a test result without binding that to a specific platform/architecture. For example if a test result is the same for all MSVC compilers I'd just require to add one file in addition to the Linux variant. But if we get different results per architecture (Win32/Win64) I'd require to add 2 additional file to the expected results.
Does that make sense?

@tbonfort

This comment has been minimized.

Show comment
Hide comment
@tbonfort

tbonfort May 6, 2013

Member

Just to make we're talking about the same solution, you'd have expected/$file, expected/$file.1, expected/$file.2 , and if expected/$file does not match, we loop through the $file.1 $file.2 ?

I'd be +0 on that. I don't think it's difficult to add that to the python files.

Member

tbonfort commented May 6, 2013

Just to make we're talking about the same solution, you'd have expected/$file, expected/$file.1, expected/$file.2 , and if expected/$file does not match, we loop through the $file.1 $file.2 ?

I'd be +0 on that. I don't think it's difficult to add that to the python files.

@szekerest

This comment has been minimized.

Show comment
Hide comment
@szekerest

szekerest May 6, 2013

Member

I think we'll require to move to the -dev list to make sure how the others would like this or have a better solution.

Member

szekerest commented May 6, 2013

I think we'll require to move to the -dev list to make sure how the others would like this or have a better solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment