interpolate.splder() failure on Fedora #2911

rgommers · 2013-09-22T13:56:53Z

From @opoplawski: With Fedora Rawhide (but not Fedora 19) I'm seeing:

ERROR: test_fitpack.TestSplder.test_kink
----------------------------------------------------------------------
Traceback (most recent call last):
   File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
     self.test(*self.arg)
   File
"/builddir/build/BUILDROOT/scipy-0.13.0-0.1.b1.fc21.x86_64/usr/lib64/python2.7/site-packages/scipy/interpolate/tests/test_fitpack.py",
line 326, in test_kink
     splder(spl2, 2)  # Should work
   File
"/builddir/build/BUILDROOT/scipy-0.13.0-0.1.b1.fc21.x86_64/usr/lib64/python2.7/site-packages/scipy/interpolate/fitpack.py",
line 1186, in splder
     "and is not differentiable %d times") % n)
ValueError: The spline has internal repeated knots and is not differentiable 2
times

and the same with python3.

The text was updated successfully, but these errors were encountered:

rgommers · 2013-09-22T13:57:35Z

@opoplawski what's different about Rawhide? Compiler versions?

rgommers · 2013-09-22T13:58:03Z

Failure reported against 0.13.0b1

pv · 2013-09-25T20:49:22Z

One possibility is that the spline used in the test already contains a duplicate knot --- it's from FITPACK fitting, which may be sensitive to rounding error.

So it would be useful to

np.savez('dump.npz', t=self.spl[0], c=self.spl[1], k=self.spl[2])

in the test.

opoplawski · 2013-09-26T04:04:39Z

Actually, looks to be more of a 32-bit issue. I see it on Fedora 19+ in 32-bit.

rgommers · 2013-09-26T18:34:27Z

I don't see it with any Python version on 32-bit Ubuntu 13.04.

pv · 2013-09-26T19:28:25Z

Thanks, can be reproduced in Fedora 19 32-bit VM.

... except that it's stochastic and doesn't occur every time. Memory alignment affecting rounding error in FITPACK, maybe

EDIT: this was mistaken, I'm not able to reproduce this issue

pv · 2013-09-26T20:46:33Z

@opoplawski: can you try to apply http://gist.github.com/anonymous/6720219 on top of maintenance/0.13.x branch f4d8447 and post the produced npz file somewhere. I don't seem to be able to reproduce this on i386 Fedora 19 after all now.

opoplawski · 2013-09-27T04:28:34Z

I still see it with current maintenance/0.13.x and that patch applied. The npz files are at http://www.cora.nwra.com/~orion/npz.tar.gz

pv · 2013-09-28T10:57:39Z

@opoplawski: sorry, I meant the dump-new-bad.npz file that the patch generates in the current directory. I'll try to switch to Fedora Rawhide to see if I get it to reproduce again...

pv · 2013-09-28T16:37:20Z

I'm not able to reproduce this on Fedora rawhide/i386:

atlas-3.10.1-1.fc21
numpy-1.8.0-0.5.b2.fc21
gcc-4.8.1-10.fc21
lapack-3.4.2-3.fc20

and in site.cfg

[atlas]
atlas_libs = satlas
library_dirs = /usr/lib/atlas

Running python runtests.py passes without errors.

However, if using tatlas, I get assertion !pthread_create(&thr->thrH, &attr, rout, arg) failed, line 111 of file /builddir/build/BUILD/ATLAS/i386_base/..//src/threads/ATL_thread_start.c in one of the linalg QR tests. This seems to be an ATLAS bug, and doesn't occur with satlas or with Openblas. If you got the error in this report with tatlas, it's a good idea to try with another BLAS library.

Otherwise, I'd need a more detailed description of the steps and environment in which this bug can be reproduced.

pv · 2013-10-06T17:50:44Z

Not reproducible via mock --rebuild scipy-0.13.0-0.3.b1.fc21.src.rpm either in fedora-rawhide-i386, in i386 Virtualbox.

juliantaylor · 2013-10-06T22:45:56Z

besides on debian unstable I can reproduce it on ubuntu 13.10 i386 but not 13.04.
might be due to gcc 4.8.

juliantaylor · 2013-10-06T23:23:24Z

yes compiled with g++-4.7 it also works in 13.10

pv · 2013-10-06T23:34:18Z

And on those platforms you can reproduce it in a VM? I'll take a spin with ubuntu, but I still don't understand why I can't reproduce it on Fedora images.

juliantaylor · 2013-10-06T23:40:04Z

I'm can reproduce it on standard pbuilder i386 chroots running ubuntu 13.04 amd64 kernel
pbuilder-dist saucy i386 create
pbuilder-dist saucy i386 login
on debian based systems (install ubuntu-dev-tools)

juliantaylor · 2013-10-06T23:52:16Z

a reason it could not happen on fedora is because debian/ubuntu somehow messes with the default CFLAGS so you end up using -O2 instead of the scipy default -O3

opoplawski · 2013-10-16T15:03:19Z

I've shifted to using serial atlas, but still see this:

http://koji.fedoraproject.org/koji/getfile?taskID=6064130&name=build.log

This is with numpy 1.8.0rc2 and scipy 0.13.0rc1.

opoplawski · 2013-10-16T15:07:21Z

Also, it appears on x86_64 and armv7hl as well.

pv · 2013-10-16T16:23:06Z

Can either one of you apply the patch I linked above, and post the file dump-new-bad.npz it generates. As noted, I wasn't able to reproduce this in Fedora mock i386 environment. I will not make any progress if without being able to reproduce this, or without help from someone who does.

pv · 2013-10-16T16:50:12Z

Can't reproduce in i386 pbuild on Ubuntu 13.10 amd64 either: https://dl.dropboxusercontent.com/u/5453551/last_operation.log

What is different in your setups? The build environment is probably almost identical, so I'm a bit at a loss on where to look...

juliantaylor · 2013-10-17T19:55:45Z

I mailed you the file.
I can reproduce it on my ubuntu 13.04 amd64 phenom X2 machine running 13.10 i386 in a chroot.
But I can't reproduce it on a intel core2 duo running amd64 13.10 with a 13.10 i386 chroot of the same state.

The only difference I see is the hardware (intel vs amd) or the kernel (3.11 vs 3.8).
Is scipy doing some machine specific optimizations?

pv · 2013-10-17T20:58:50Z

Thanks. I tried it before on two machines and failed to reproduce: Intel(R) Xeon(R) CPU E5430 on Linux 2.6.32; Intel(R) Core(TM) i7-3770K on Linux 3.11.0. Different gcc versions, too (4.7.2 and 4.8.1). Seems to point towards some Intel vs. Amd difference.

Looking at the file you sent, it looks like a bug in the FITPACK insert subroutine. Namely, in the failing case we got

spl2[0] == array([  0.00000000e+00,   ...   4.89078109e-01,
         5.00000000e-01,   5.00000000e-01,   5.08130999e-01,
         5.08130999e-01,   5.08130999e-01,   5.08130999e-01,
         ...
         5.08130999e-01])

whereas in the good case we have

spl2 == array([  0.00000000e+00,   ...,   4.89078109e-01,
         5.00000000e-01,   5.00000000e-01,   5.08130999e-01,
         5.27672398e-01,   5.47708490e-01,   5.68245458e-01,
         5.89289487e-01,   6.10846760e-01,   6.32923460e-01,
         6.55525771e-01,   6.78659877e-01,   7.02331962e-01,
         7.26548208e-01,   7.51314801e-01,   7.76637923e-01,
         8.02523758e-01,   8.28978490e-01,   8.56008303e-01,
         8.83619379e-01,   9.11817904e-01,   9.40610059e-01,
         1.00000000e+00,   1.00000000e+00,   1.00000000e+00,
         1.00000000e+00])

The input spline spl differs only in rounding errors. Running

spl = np.load('dump-new-bad.npz')['tck']
spl2 = insert(0.5, spl, m=2)

doesn't reproduce the strange results here, so it's really probably some issue with the insert routine. The splder routine itself is probably OK (as it should be --- it's straightforward pure-Python code).

Strangely enough, the code in question does not have anything special on our side. It's some ye olde Fortran code wrapped with C.

pv · 2013-10-17T21:08:32Z

I wouldn't rule bugs in the Fortran code out; it's patched Fitpack code, and some of the patches may be buggy

pv · 2013-10-23T13:28:55Z

Note that further debugging of this issue is in practice impossible without gdb-enabled access to a machine on which the issue can be reproduced. I do not have access to hardware/VMs where this can be reproduced.

jgehrcke · 2014-02-18T20:32:22Z

I might be able to help. This is the only test that fails in my build of scipy 0.13.3:

Traceback (most recent call last):
  File "/projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/scipy/interpolate/tests/test_fitpack.py", line 329, in test_kink
    splder(spl2, 2)  # Should work
  File "/projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/scipy/interpolate/fitpack.py", line 1186, in splder
    "and is not differentiable %d times") % n)
ValueError: The spline has internal repeated knots and is not differentiable 2 times

Python 2.7.6, built with GCC 4.1.2 on CentOS 5.8
numpy 1.8.0, built with Intel MKL, icc, ifort 12.1.3
scipy 0.13.3, built with Intel MKL, icc, ifort 12.1.3

Some machine information:
Linux 2.6.18, x86_64, Intel Xeon X5670

I would need some instructions on how to proceed debugging this issue, if this is of interest.

pv · 2014-02-18T21:07:36Z

Write file script.gdb:

define nstep
    set $foo = $arg0
    while ($foo)
    info locals
    p tt(i+1)
    p cc(i+1)
    step
    set $foo = $foo - 1
    end
end

set breakpoint pending on
break fpinst
run
bt
nstep 6500
quit

Run

gdb --batch --command=script.gdb --args python runtests.py -g -t scipy/interpolate/tests/test_fitpack.py:TestSplder.test_kink > dump.txt 2>&1

Here is a "good" trace for comparison: https://gist.github.com/pv/9080048

Examine differences between that and the trace you get, and determine the reason why the result is different on your platform (the 1e-310 floating point numbers can be ignored, as the gdb script prints also uninitialized variables). Are the inputs the same? Are the outputs the same? Is there a bug in the Fortran code? The scipy/interpolate/fitpack/fpinst.f routine is luckily not very long.

It's also possible that the Fortran compiler miscompiles the file. Check by compiling it with different optimization levels (use the FOPT environment variable). If there's a difference, check the generated assembler output.

Try to reduce the problem to a pure-fortran test case, so that it is easier to debug.

pv · 2014-05-20T11:20:29Z

I just noticed that I can reproduce this issue with my new laptop with intel core i7, with gcc 4.8.2-19ubuntu1. It does not occur on gfortran optimization level -O2 but occurs on -O3, which probably implies this is caused by a compiler bug.

The miscompiled file is scipy/interpolate/fitpack/fpinst.f --- when (only) this file is compiled with O2, test succeeds, when with O3, it fails.

I don't have now time to look into this in depth, but the optimized trees are here: good O2, bad O3.

pv · 2014-05-20T13:07:44Z

This seems to be an argument aliasing issue: gfortran on -O3 seems to assume strict aliasing on the function input arguments, which is broken here:
https://github.com/scipy/scipy/blob/master/scipy/interpolate/src/__fitpack.h#L831

Using different buffers for different args makes the issue go away.

juliantaylor · 2014-05-20T16:32:46Z

this is no compiler bug, the fortran language (at least < 95) does not allow aliasing.

pv · 2014-05-20T17:19:05Z

@juliantaylor: I agree, I found the aliasing issue only later. The fix should be relatively simple.

…iased Fixes scipygh-2911

pv · 2014-05-20T17:50:15Z

Fix in gh-3673

…iased Fixes scipygh-2911

pv · 2014-08-14T12:27:29Z

@Dapid: please double-check by removing existing scipy installations,
and doing a clean reinstall (git clean -fdx; rm -rf build)

pv · 2014-08-14T12:31:50Z

Also double-check that you are on the current master, and not at an
older version.

Dapid · 2014-08-14T12:35:36Z

@pv it seems it was caused by some leftovers, it is now correct. Thanks!

xgh45 · 2017-08-03T11:22:20Z

@rgommers
hello,I am have the problem that ValueError: The spline has internal repeated knots and is not differentiable 2 times.Do you solve the problems?

juliantaylor mentioned this issue Sep 24, 2013

test_fitpack.TestSplder.test_kink fails on i386 #2929

Closed

pv mentioned this issue Oct 23, 2013

test test_fitpack.TestSplder.test_kink fails #3013

Closed

pv mentioned this issue May 20, 2014

Inserting a knot in a spline #3672

Closed

pv added a commit to pv/scipy-work that referenced this issue May 20, 2014

BUG: interpolate/fitpack: arguments to fortran routines may not be al…

c8648f1

…iased Fixes scipygh-2911

pv mentioned this issue May 20, 2014

BUG: interpolate/fitpack: arguments to fortran routines may not be aliased #3673

Merged

rgommers added this to the 0.15.0 milestone May 20, 2014

pv closed this as completed in #3673 Jun 15, 2014

jennystone pushed a commit to jennystone/scipy that referenced this issue Jun 24, 2014

BUG: interpolate/fitpack: arguments to fortran routines may not be al…

35c0f51

…iased Fixes scipygh-2911

jennystone pushed a commit to jennystone/scipy that referenced this issue Jun 25, 2014

BUG: interpolate/fitpack: arguments to fortran routines may not be al…

90122cd

…iased Fixes scipygh-2911

interpolate.splder() failure on Fedora #2911

interpolate.splder() failure on Fedora #2911

Comments

rgommers commented Sep 22, 2013

rgommers commented Sep 22, 2013

rgommers commented Sep 22, 2013

pv commented Sep 25, 2013

opoplawski commented Sep 26, 2013

rgommers commented Sep 26, 2013

pv commented Sep 26, 2013

pv commented Sep 26, 2013

opoplawski commented Sep 27, 2013

pv commented Sep 28, 2013

pv commented Sep 28, 2013

pv commented Oct 6, 2013

juliantaylor commented Oct 6, 2013

juliantaylor commented Oct 6, 2013

pv commented Oct 6, 2013

juliantaylor commented Oct 6, 2013

juliantaylor commented Oct 6, 2013

opoplawski commented Oct 16, 2013

opoplawski commented Oct 16, 2013

pv commented Oct 16, 2013

pv commented Oct 16, 2013

juliantaylor commented Oct 17, 2013

pv commented Oct 17, 2013

pv commented Oct 17, 2013

pv commented Oct 23, 2013

jgehrcke commented Feb 18, 2014

pv commented Feb 18, 2014

pv commented May 20, 2014

pv commented May 20, 2014

juliantaylor commented May 20, 2014

pv commented May 20, 2014

pv commented May 20, 2014

pv commented Aug 14, 2014

pv commented Aug 14, 2014

Dapid commented Aug 14, 2014

xgh45 commented Aug 3, 2017