Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

stackplot_test_baseline has different results on 32-bit and 64-bit platforms #1726

Merged
merged 1 commit into from

4 participants

Michael Droettboom Damon McDougall Benjamin Root Julian Taylor
Michael Droettboom
Owner

I haven't quite gotten down to the bottom of this one yet, but this is the cause of the current failures on Travis.

On a 64-bit Linux, I get the following for stackplot_test_baseline (and I suspect this is correct):

stackplot_test_baseline-expected

And on 32-bit:

stackplot_test_baseline

The problem appears to be somewhere in the calculation of the data limits. On 64-bit, the xmin is exactly 0, on 32-bit it's a very small negative number which causes the locator to add an extra step on the left.

Damon McDougall
Collaborator

I feel like we've had an issue exactly like this in the past with bounding boxes. A cheap-and-dirty hack would be to update the the [xy]lim in the test so that no bad rounding happens in the 32-bit case.

Michael Droettboom
Owner

That's not a bad idea for a quick fix to get the tests passing -- but I'd still like to get to the bottom of this, of course.

Michael Droettboom
Owner

Okay -- I'm posting this mostly out of desperation to see if anyone else has any ideas.

The difference in these platforms comes down to the affine transformation. At one point, the collection is mapped through a transformation to determine the bounds of the data. While the inputs to _path.cpp::affine_transform are bit-for-bit exactly the same, the outputs are not. It is all double arithmetic throughout, so I am surprised there is a platform difference (as one might expect from long doubles which are different on these platforms).

I've boiled it down to a simple C program:

#include <stdio.h>

int main() {
  double b = 0.0d;
  double x = 80.0d;
  double d = 0x1.5555555555555p-9d;
  double y = 48.0d;
  double f = -0x1.ffffffffffffdp-4d;

  double t = b * x + d * y + f;

  printf("Output %g %a\n", t, t);
}

In the Travis environment (which is a recent i686 Ubuntu with gcc 4.6), I get:

Output 3.46945e-17 0x1.4p-55

On my personal machine, F18 x86_64 with gcc 4.7, I get:

Output 4.16334e-17 0x1.8p-55

It is this small difference that is causing the limit detection to fall over on this test.

We have some ways to solve this: 1) find out why there is this inconsistency and work around it if possible. 2) make the locators to ignore very small crossings of the boundaries -- this of course needs to be some dynamically determined fraction of the overall limits. (That is, if the true range is (-1e-45 to 100), it should just round to (0, 100)).

In the long run, we're probably going to have to do something like 2) in any event to be more robust against floating-point error anyway. Any other thoughts?

Benjamin Root
Collaborator
Michael Droettboom
Owner

Alright. This was a fun one.

This article was helpful in decoding some of the quirks of x86 floating-point. It may actually be the case that to trigger this you have to be both x86 (not x86-64) and non-SSE (which is true for the Travis VMs).

http://www.yosefk.com/blog/consistency-how-to-defeat-the-purpose-of-ieee-floating-point.html

A fix is attached, which (hopefully) should pass on Travis.

Michael Droettboom
Owner

The Travis tests are better than master (only one failure now), so I'm ready to count this as a success ;)

Michael Droettboom mdboom merged commit a4e4c68 into from
Julian Taylor

the code kind of looks like its relevant for performance, would it make sense to only use the unoptimized version on the platform its needed via ifdefs?
Also doesn't the line right below the fixed one have the same problem?

maybe one could also use long double instead of double, those still have decent performance on i386

Michael Droettboom
Owner

@juliantaylor: This fix isn't really about optimization -- it's able consistent floating point results across platforms. Can you clarify what you mean by "Also doesn't the line right below the fixed one have the same problem?"

Using long double instead of double would actually exacerbate the problem, since they are of different sizes on i386 and x86_64.

Julian Taylor

the else clause below the fixed one has the same expression with lots of temporaries which are not spilled to the stack explicitly, it may have the same inconsistency problem.

long double has the same size on i386 and amd64, 80 bit the size of the fpu stack (with gcc aligned to 16 byte). Using them means you don't lose your excess precision when the variable spills on the stack making it more consistent (but you lose vectorization on amd64)
It may not exist on some non x86 platforms, but this issue is x86 specific.

a drastic alternative would be to ditch support for non SSE x86 cpu's and change the fpu mode to sse globally (-mfpmath=sse with gcc), but it would make all float calculations in matplotlib more consistent.

Michael Droettboom mdboom deleted the branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
Showing with 12 additions and 2 deletions.
  1. +12 −2 src/_path.cpp
14 src/_path.cpp
View
@@ -1159,14 +1159,24 @@ _path_module::affine_transform(const Py::Tuple& args)
size_t stride1 = PyArray_STRIDE(vertices, 1);
double x;
double y;
+ volatile double t0;
+ volatile double t1;
+ volatile double t;
for (size_t i = 0; i < n; ++i)
{
x = *(double*)(vertex_in);
y = *(double*)(vertex_in + stride1);
- *vertex_out++ = a * x + c * y + e;
- *vertex_out++ = b * x + d * y + f;
+ t0 = a * x;
+ t1 = c * y;
+ t = t0 + t1 + e;
+ *(vertex_out++) = t;
+
+ t0 = b * x;
+ t1 = d * y;
+ t = t0 + t1 + f;
+ *(vertex_out++) = t;
vertex_in += stride0;
}
Something went wrong with that request. Please try again.