New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF: nan entries cause segfault #690

Merged
merged 4 commits into from Aug 1, 2015

Conversation

Projects
None yet
3 participants
@omarocegueda
Contributor

omarocegueda commented Jul 29, 2015

This fixes the segmentation fault caused when attempting to interpolate an image at nan entries.

@@ -144,6 +144,8 @@ def set_affine(self, affine):
self.affine_inv = None
return
try:
if np.isnan(np.sum(affine)):

This comment has been minimized.

@omarocegueda

omarocegueda Jul 29, 2015

Contributor

Using sum is faster than, e.g., min for finding nan's, according to this:
http://stackoverflow.com/questions/6736590/fast-check-for-nan-in-numpy

This comment has been minimized.

@matthew-brett

matthew-brett Jul 30, 2015

Member

Surely these two lines should go above the try block? I think the np.sum trick is to avoid large temporary arrays, for me a simple np.any(np.isnan(x)) is faster still, and I think it's easier to read.

assert_raises(ValueError, vfu.transform_2d_affine, circle, shape, invalid)
assert_raises(ValueError, vfu.transform_2d_affine, circle, shape, invalid)
assert_raises(ValueError, vfu.transform_2d_affine, circle, shape, invalid)
# Exceptions from warp_2d_nn
assert_raises(ValueError, vfu.transform_2d_affine, circle, shape, invalid_nan)

This comment has been minimized.

@omarocegueda

omarocegueda Jul 29, 2015

Contributor

Without this fix, this assert will reproduce the segfault.

@@ -17,6 +17,8 @@ def is_valid_affine(double[:, :] M, int dim):
return True
if M.shape[0] < dim or M.shape[1] < dim + 1:
return False
if np.isnan(np.sum(M)):

This comment has been minimized.

@omarocegueda

omarocegueda Jul 29, 2015

Contributor

We are already calling this check from all the exposed cython functions that use affines, so we can check for nans here.

This comment has been minimized.

@matthew-brett

matthew-brett Jul 30, 2015

Member

Ditto here for np.any(np.isnan(M))

@arokem

This comment has been minimized.

Member

arokem commented Jul 29, 2015

Does this actually address all of the errors mentioned in #654?

In particular, one of the things reported there was an error in test_imaffine:test_mi_gradient (for example, see https://travis-ci.org/nipy/dipy/jobs/73233253#L2162), which is not a segfault. Is this addressed through these changes?

@omarocegueda

This comment has been minimized.

@arokem

This comment has been minimized.

Member

arokem commented Jul 29, 2015

OK - just wanted to make sure that I have the full picture.
On Wed, Jul 29, 2015 at 4:38 PM, Omar Ocegueda notifications@github.com
wrote:

I don't know if this will solve the other bug, but the segmentation fault
occurs in several buildbots:

http://nipy.bic.berkeley.edu/builders/dipy-bdist32-33/builds/138/steps/shell_8/logs/stdio

http://nipy.bic.berkeley.edu/builders/dipy-py2.6-32/builds/557/steps/shell_6/logs/stdio

http://nipy.bic.berkeley.edu/builders/dipy-py2.7-osx-10.7/builds/310/steps/shell_6/logs/stdio

http://nipy.bic.berkeley.edu/builders/dipy-py2.7-osx-10.8/builds/378/steps/shell_6/logs/stdio

Yes - I expect this PR will solve these issues. Have you tried a a
try_branch with any of these?

The other bug appears in this buildbot:

http://nipy.bic.berkeley.edu/builders/dipy-py3.4/builds/192/steps/shell_6/logs/stdio
@matthew-brett https://github.com/matthew-brett, can I have access to
that machine too?
Do you guys think it will be better to add both fixes to this PR?

Do you have some sense of what is causing it? I don't think it matters much
if this one also gets resolved here: If you think this resolves the
segfaults, I am ready to merge this one, and you can take the other fix on
another PR, but if you prefer to keep going here, that's fine too. Your
call.


Reply to this email directly or view it on GitHub
#690 (comment).

@omarocegueda

This comment has been minimized.

Contributor

omarocegueda commented Jul 30, 2015

I didn't run a try_branch but I tested the fix manually on the buildbot Matthew gave access to (It didn't brake after the fix).

I'm not sure about the other bug, it may be as simple as a precission issue (e.g. the result was something like 0.9989, very close to the assertion value but still failing). I need to reproduce the bug and find out why it fails there and not in other platforms.

@matthew-brett

This comment has been minimized.

Member

matthew-brett commented Jul 30, 2015

This is just option 3?

@matthew-brett

This comment has been minimized.

Member

matthew-brett commented Jul 30, 2015

Omar - I'll check with the owner of that buildbot machine and the OSX machine - will get back to you.

@matthew-brett

This comment has been minimized.

Member

matthew-brett commented Jul 30, 2015

Now I think I see that this is option 2 and 3.

@omarocegueda

This comment has been minimized.

Contributor

omarocegueda commented Jul 31, 2015

Thanks @matthew-brett!, yes this is option 2 and 3. I just reproduced the bug on the buildbot. The root cause is that iteration over dictionary keys is no longer deterministic in Python 3:
http://stackoverflow.com/questions/14956313/dictionary-ordering-non-deterministic-in-python3

This explains the "intermitent" behavior. The assertion was failing because the inner product between numeric and analytical gradients was about 0.994. I think it is safe to reduce the threshold to 0.99. For the non-deterministic behavior, I guess the way to go is to replace the dictionary with a list. What do you think?

@matthew-brett

This comment has been minimized.

Member

matthew-brett commented Jul 31, 2015

Where is the iteration-over-dictionary code?

@omarocegueda

This comment has been minimized.

@matthew-brett

This comment has been minimized.

Member

matthew-brett commented Jul 31, 2015

How about for ttype in sorted(factors) to make it deterministic?

@omarocegueda

This comment has been minimized.

Contributor

omarocegueda commented Jul 31, 2015

sure! that's fine too. I'll do the fix.

@omarocegueda

This comment has been minimized.

Contributor

omarocegueda commented Jul 31, 2015

Alright!, this now addresses both failures (checked directly on the failing buildbots)

@@ -172,7 +172,7 @@ def test_align_origins_3d():
def test_affreg_all_transforms():
# Test affine registration using all transforms with typical settings
for ttype in factors.keys():
for ttype in sorted(factors):

This comment has been minimized.

@matthew-brett

matthew-brett Jul 31, 2015

Member

Comment to explain why the factor keys must be sorted (in order to preserve relationship of random numbers to dict key / values)? Ditto for other instances.

This comment has been minimized.

@omarocegueda

omarocegueda Aug 1, 2015

Contributor

Comments added. Actually only one of the tests needed the sort, but I think it is ok to still sort the keys in the other three places, just in case we extend the tests in the future.

@matthew-brett

This comment has been minimized.

Member

matthew-brett commented Aug 1, 2015

There seems to be a new merge commit here - 33451ca - introducing lots of changes not relevant to this PR?

@omarocegueda omarocegueda force-pushed the omarocegueda:fix_nans branch from c4c5850 to 8bf1318 Aug 1, 2015

@omarocegueda

This comment has been minimized.

Contributor

omarocegueda commented Aug 1, 2015

Right, now I see what happened... fortunately there is cherry-pick!
Thanks!

@@ -143,6 +143,8 @@ def set_affine(self, affine):
if self.affine is None:
self.affine_inv = None

This comment has been minimized.

@matthew-brett

matthew-brett Aug 1, 2015

Member

Sorry to do small suggestions not relevant to this PR - but how about putting this line after self.affine = affine so that self.affine_inv is always defined (as None or an affine) even if the inverse raises an error. Otherwise it's set to whatever it was before, which could be confusing. Probably also worth noting that the method sets self.affine_inv in the docstring too.

This comment has been minimized.

@matthew-brett

matthew-brett Aug 1, 2015

Member

Actually, don't worry, I'll do a PR for that.

This comment has been minimized.

@matthew-brett
@matthew-brett

This comment has been minimized.

Member

matthew-brett commented Aug 1, 2015

Great - thanks for fixing - merging now.

matthew-brett added a commit that referenced this pull request Aug 1, 2015

Merge pull request #690 from omarocegueda/fix_nans
MRG: fix bugs in affine registration

NaN entries in affines causing segfault on some platforms.

Relaxing similarity threshold for random number tests.

@matthew-brett matthew-brett merged commit 4943433 into nipy:master Aug 1, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@matthew-brett matthew-brett referenced this pull request Aug 1, 2015

Closed

WIP: affine map tests #693

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment