# BF: nan entries cause segfault #690

Merged
merged 4 commits into from Aug 1, 2015

## Conversation

Projects
None yet
3 participants
Contributor

### omarocegueda commented Jul 29, 2015

 This fixes the segmentation fault caused when attempting to interpolate an image at nan entries.

### omarocegueda reviewed Jul 29, 2015

 @@ -144,6 +144,8 @@ def set_affine(self, affine): self.affine_inv = None return try: if np.isnan(np.sum(affine)):

#### omarocegueda Jul 29, 2015

Contributor

Using sum is faster than, e.g., min for finding nan's, according to this:
http://stackoverflow.com/questions/6736590/fast-check-for-nan-in-numpy

#### matthew-brett Jul 30, 2015

Member

Surely these two lines should go above the try block? I think the np.sum trick is to avoid large temporary arrays, for me a simple np.any(np.isnan(x)) is faster still, and I think it's easier to read.

### omarocegueda reviewed Jul 29, 2015

 assert_raises(ValueError, vfu.transform_2d_affine, circle, shape, invalid) assert_raises(ValueError, vfu.transform_2d_affine, circle, shape, invalid) assert_raises(ValueError, vfu.transform_2d_affine, circle, shape, invalid) # Exceptions from warp_2d_nn assert_raises(ValueError, vfu.transform_2d_affine, circle, shape, invalid_nan)

#### omarocegueda Jul 29, 2015

Contributor

Without this fix, this assert will reproduce the segfault.

### omarocegueda reviewed Jul 29, 2015

 @@ -17,6 +17,8 @@ def is_valid_affine(double[:, :] M, int dim): return True if M.shape[0] < dim or M.shape[1] < dim + 1: return False if np.isnan(np.sum(M)):

#### omarocegueda Jul 29, 2015

Contributor

We are already calling this check from all the exposed cython functions that use affines, so we can check for nans here.

#### matthew-brett Jul 30, 2015

Member

Ditto here for np.any(np.isnan(M))

Member

### arokem commented Jul 29, 2015

 Does this actually address all of the errors mentioned in #654? In particular, one of the things reported there was an error in test_imaffine:test_mi_gradient (for example, see https://travis-ci.org/nipy/dipy/jobs/73233253#L2162), which is not a segfault. Is this addressed through these changes?
Contributor

### omarocegueda commented Jul 29, 2015

 I don't know if this will solve the other bug, but the segmentation fault occurs in several buildbots: http://nipy.bic.berkeley.edu/builders/dipy-bdist32-33/builds/138/steps/shell_8/logs/stdio http://nipy.bic.berkeley.edu/builders/dipy-py2.6-32/builds/557/steps/shell_6/logs/stdio http://nipy.bic.berkeley.edu/builders/dipy-py2.7-osx-10.7/builds/310/steps/shell_6/logs/stdio http://nipy.bic.berkeley.edu/builders/dipy-py2.7-osx-10.8/builds/378/steps/shell_6/logs/stdio The other bug appears in this buildbot: http://nipy.bic.berkeley.edu/builders/dipy-py3.4/builds/192/steps/shell_6/logs/stdio @matthew-brett, can I have access to that machine too? Do you guys think it will be better to add both fixes to this PR?
Member

### arokem commented Jul 29, 2015

 OK - just wanted to make sure that I have the full picture. On Wed, Jul 29, 2015 at 4:38 PM, Omar Ocegueda notifications@github.com wrote: I don't know if this will solve the other bug, but the segmentation fault occurs in several buildbots: http://nipy.bic.berkeley.edu/builders/dipy-bdist32-33/builds/138/steps/shell_8/logs/stdio http://nipy.bic.berkeley.edu/builders/dipy-py2.6-32/builds/557/steps/shell_6/logs/stdio http://nipy.bic.berkeley.edu/builders/dipy-py2.7-osx-10.7/builds/310/steps/shell_6/logs/stdio http://nipy.bic.berkeley.edu/builders/dipy-py2.7-osx-10.8/builds/378/steps/shell_6/logs/stdio Yes - I expect this PR will solve these issues. Have you tried a a try_branch with any of these? The other bug appears in this buildbot: http://nipy.bic.berkeley.edu/builders/dipy-py3.4/builds/192/steps/shell_6/logs/stdio @matthew-brett https://github.com/matthew-brett, can I have access to that machine too? Do you guys think it will be better to add both fixes to this PR? Do you have some sense of what is causing it? I don't think it matters much if this one also gets resolved here: If you think this resolves the segfaults, I am ready to merge this one, and you can take the other fix on another PR, but if you prefer to keep going here, that's fine too. Your call. — Reply to this email directly or view it on GitHub #690 (comment).
Contributor

### omarocegueda commented Jul 30, 2015

 I didn't run a try_branch but I tested the fix manually on the buildbot Matthew gave access to (It didn't brake after the fix). I'm not sure about the other bug, it may be as simple as a precission issue (e.g. the result was something like 0.9989, very close to the assertion value but still failing). I need to reproduce the bug and find out why it fails there and not in other platforms.
Member

### matthew-brett commented Jul 30, 2015

 This is just option 3?
Member

### matthew-brett commented Jul 30, 2015

 Omar - I'll check with the owner of that buildbot machine and the OSX machine - will get back to you.
Member

### matthew-brett commented Jul 30, 2015

 Now I think I see that this is option 2 and 3.
Contributor

### omarocegueda commented Jul 31, 2015

 Thanks @matthew-brett!, yes this is option 2 and 3. I just reproduced the bug on the buildbot. The root cause is that iteration over dictionary keys is no longer deterministic in Python 3: http://stackoverflow.com/questions/14956313/dictionary-ordering-non-deterministic-in-python3 This explains the "intermitent" behavior. The assertion was failing because the inner product between numeric and analytical gradients was about 0.994. I think it is safe to reduce the threshold to 0.99. For the non-deterministic behavior, I guess the way to go is to replace the dictionary with a list. What do you think?
Member

### matthew-brett commented Jul 31, 2015

 Where is the iteration-over-dictionary code?
Contributor

### omarocegueda commented Jul 31, 2015

 Here is the dictionary: https://github.com/nipy/dipy/blob/master/dipy/align/tests/test_imaffine.py#L25 and here is the loop that is failing: https://github.com/nipy/dipy/blob/master/dipy/align/tests/test_imaffine.py#L268
Member

### matthew-brett commented Jul 31, 2015

 How about for ttype in sorted(factors) to make it deterministic?
Contributor

### omarocegueda commented Jul 31, 2015

 sure! that's fine too. I'll do the fix.
Contributor

### omarocegueda commented Jul 31, 2015

 Alright!, this now addresses both failures (checked directly on the failing buildbots)

### matthew-brett reviewed Jul 31, 2015

 @@ -172,7 +172,7 @@ def test_align_origins_3d(): def test_affreg_all_transforms(): # Test affine registration using all transforms with typical settings for ttype in factors.keys(): for ttype in sorted(factors):

#### matthew-brett Jul 31, 2015

Member

Comment to explain why the factor keys must be sorted (in order to preserve relationship of random numbers to dict key / values)? Ditto for other instances.

#### omarocegueda Aug 1, 2015

Contributor

Comments added. Actually only one of the tests needed the sort, but I think it is ok to still sort the keys in the other three places, just in case we extend the tests in the future.

Member

### matthew-brett commented Aug 1, 2015

 There seems to be a new merge commit here - 33451ca - introducing lots of changes not relevant to this PR?

Contributor

### omarocegueda commented Aug 1, 2015

 Right, now I see what happened... fortunately there is cherry-pick! Thanks!

### matthew-brett reviewed Aug 1, 2015

 @@ -143,6 +143,8 @@ def set_affine(self, affine): if self.affine is None: self.affine_inv = None

#### matthew-brett Aug 1, 2015

Member

Sorry to do small suggestions not relevant to this PR - but how about putting this line after self.affine = affine so that self.affine_inv is always defined (as None or an affine) even if the inverse raises an error. Otherwise it's set to whatever it was before, which could be confusing. Probably also worth noting that the method sets self.affine_inv in the docstring too.

#### matthew-brett Aug 1, 2015

Member

Actually, don't worry, I'll do a PR for that.

Member
Member

### matthew-brett commented Aug 1, 2015

 Great - thanks for fixing - merging now.

### matthew-brett added a commit that referenced this pull request Aug 1, 2015

Merge pull request #690 from omarocegueda/fix_nans
MRG: fix bugs in affine registration

NaN entries in affines causing segfault on some platforms.

Relaxing similarity threshold for random number tests.

### matthew-brett merged commit 4943433 into nipy:master Aug 1, 2015 1 check passed

#### 1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

Closed