2738 fix breaks scipy test_decomp.py in 1.7.0rc1 #2939

Closed
juliantaylor opened this Issue Jan 22, 2013 · 9 comments

Comments

Projects
None yet
5 participants
Contributor

juliantaylor commented Jan 22, 2013

edit: the issue is actually hash randomization and a bug in scipy, see below

the fix for gh-2738 (commit e208de6) in numpy 1.7.0rc1 breaks a scipy 0.11 and current git head test with python3.3.
This was previously reported to scipy in http://mail.scipy.org/pipermail/scipy-dev/2012-September/017995.html but I did not find a reaction.
Bisecting numpy lead to above commit.

The crash can be reproduced on ubuntu 13.04 with python3.3.
It occurs with the debug and regular python3.3 variant.
It manifests itself with random segfaults, glibc memory corruption aborts and hangs when you execute scipy/linalg/tests/test_decomp.py
reverting the commit from the rc1 tag fixes the crash.
numpy git head does not seem affected.

Owner

njsmith commented Jan 22, 2013

Strange, since that commit looks correct... Are you able to debug any
further? Does reverting that commit fix the valgrind issues or just the
crash? (I'm concerned that the commit in question might not actually be
relevant except that it perturbs the memory allocation pattern somehow and
is affecting the outcome of an unrelated bug.) Is there a simple recipe to
reproduce?

the fix for gh-2738 #2738 (commit
e208de6e208de6)
in numpy 1.7.0rc1 breaks a scipy 0.11 and current git head test with
python3.3.
This was previously reported to scipy in
http://mail.scipy.org/pipermail/scipy-dev/2012-September/017995.html but I
did not find a reaction.
Bisecting numpy lead to above commit.

The crash can be reproduced on ubuntu 13.04 with python3.3.
It manifests itself with random segfaults, glibc memory corruption aborts
and hangs when you execute scipy/linalg/tests/test_decomp.py
reverting the commit from the rc1 tag fixes the crash.
numpy git head does not seem affected.


Reply to this email directly or view it on
GitHubhttps://github.com/numpy/numpy/issues/2939.

Owner

charris commented Jan 22, 2013

I can confirm the result for python 3.3 running on Fedora 18. The test that fails is

def test_zhbevx(self):
    """Compare zhbevx eigenvalues and eigenvectors
       with the result of linalg.eig."""
    N,N = shape(self.herm_mat)
    ## Achtung: Argumente 0.0,0.0,range?
    w, evec, num, ifail, info = zhbevx(self.bandmat_herm, 0.0, 0.0, 1, N,
                                   compute_v=1, range=2)
    evec_ = evec[:,argsort(w)]
    assert_array_almost_equal(sort(w), self.w_herm_lin)
    assert_array_almost_equal(abs(evec_), abs(self.evec_herm_lin))

And sometimes I get the error message

ERROR: Compare zhbevx eigenvalues and eigenvectors
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/charris/Workspace/scipy.git/scipy/linalg/tests/test_decomp.py", line 421, in test_zhbevx
    evec_ = evec[:,argsort(w)]
IndexError: index 1 is out of bounds for axis 1 with size 1

So there is some memory corruption here. How this is related to the commit is unclear, but it looks like it occurs in the zhbevx call.

Owner

charris commented Jan 22, 2013

Commenting out all the Py_DECREF(typecode) references gives

ERROR: Compare zhbevx eigenvalues and eigenvectors
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/charris/Workspace/scipy.git/scipy/linalg/tests/test_decomp.py", line 420, in test_zhbevx
compute_v=1, range=2)
ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (10,0,)

So there appear to be two errors, one in the multiarray error return path and another in zhbevx.

Owner

pv commented Jan 22, 2013

This occurs only on Python 3.3, so it's not clear what the issue with zhbevx would be (as far as I see, there are no errors in the f2py wrapper declaration). Maybe Python 3.3 somehow subtly breaks f2py --- would need to compare f2py output between 3.2 and 3.3?

Contributor

juliantaylor commented Jan 22, 2013

sorry the commit I mentioned was a red herring, reverting it does not fix the issue it just changes the failure from a crash to the ValueError charris posted.

Contributor

juliantaylor commented Jan 22, 2013

the problem seems to be related to this pointless hash randomization.
building scipy with PYTHONHASHSEED=0 exported gives a very different _flapackmodule.c and the test seems to be fixed

Owner

pv commented Jan 22, 2013

*hevbx f2py declaration in Scipy is missing a number of depend() directives, adding which seems to fix it. Python 3.3 indeed returns dict items in different order than earlier (which is good --- helps to find bugs like this), so the generated code uses uninitialized variables -> boom.

So, a scipy bug. http://projects.scipy.org/scipy/ticket/1819
As a consequence, Scipy 0.11 will not run on Python 3.3. Fixes will be in 0.12.0...

@pv pv closed this Jan 22, 2013

Member

seberg commented Jan 22, 2013

Maybe it is nothing, but how come e208de6 fixed anything? It looks good (and certainly cannot hurt), but if you look closer it seems to me like the reference is basically there because it is borrowed from r (which stole it). When r is NULL the function returns right away...

Member

seberg commented Jan 22, 2013

Ah ok, nvm that... as noted in the original report to it, r may sometimes replace the typecode with a new one and then will not hold a reference anymore...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment