New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: avoid calling log() on zero-valued elements in anisotropic_power #1086

Closed
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@jchoude
Contributor

jchoude commented Jun 22, 2016

In reconst.shm.anisotropic_power, there is a log() normalization, to bring back values to a reasonable range.

However, for some voxels (for example, when outside of the brain), the pre-normalized AP values are already worth 0. In this case, applying the log() call would generate a numpy warning, that could be misleading for end users.

In this patch, we simply apply the log only to elements that are > 0. Guaranteed to be >= 0, since it's a sum of absolute values.

Also, if the non_negative kw arg is set to False, those 0-valued voxels get "normalized" to -inf, which is not a really useful value. They can also create other issues if those maps are further post-processed.

Finally, a few quick typos were fixed in the doc for this function.

@coveralls

This comment has been minimized.

coveralls commented Jun 22, 2016

Coverage Status

Coverage decreased (-0.4%) to 82.441% when pulling 851cb6a on jchoude:ENH_avoid_log_zero into 82573c7 on nipy:master.

@codecov-io

This comment has been minimized.

codecov-io commented Jun 22, 2016

Current coverage is 80.35% (diff: 100%)

Merging #1086 into master will decrease coverage by 0.46%

@@             master      #1086   diff @@
==========================================
  Files           200        200          
  Lines         22985      22986     +1   
  Methods           0          0          
  Messages          0          0          
  Branches       2309       2309          
==========================================
- Hits          18576      18470   -106   
- Misses         3933       4036   +103   
- Partials        476        480     +4   

Powered by Codecov. Last update 82573c7...777dfa2

The power, $n$ is usually set to $n=2$.
The final AP image is then shifted by -log(normal_factor), to be strictly non-negative. Remaining values < 0 are discarded (set to 0), per default,
and this option is controlled throug the `non_negative` key word argument.
The final AP image is then shifted by -log(norm_factor), to be strictly non-negative. Remaining values < 0 are discarded (set to 0), per default,

This comment has been minimized.

@arokem

arokem Jun 22, 2016

Member

While we're already here, would you mind wrapping this to 80 characters?

# Shift the map to be mostly non-negative:
log_ap = np.log(ap) - np.log(norm_factor)
# Shift the map to be mostly non-negative,
# only applying the log operation to non-zero elements

This comment has been minimized.

@arokem

arokem Jun 22, 2016

Member

non-negative. There might still be negative values here, right?

This comment has been minimized.

@jchoude

jchoude Jun 23, 2016

Contributor

Shouldn't be. ap is defined as sum of absolute values of the SH coefficients. Since we have an abs in the sum, it should never be negative.

While we are there, do we want a check on the value of the power argument, or we allow anything?

This comment has been minimized.

@arokem

arokem Jun 23, 2016

Member

What about ap values between 0 and 1? These would result in negative log(ap)

This comment has been minimized.

@jchoude

jchoude Jun 23, 2016

Contributor

Negative log(ap) is not really a problem, for 2 reasons:

  1. The -log(norm_factor) will shift most of them back to positive values, as long as the norm_factor stays reasonable. For example, the default value is 0.00001, for which the -log(norm_factor) will result in +11.5129, shifting all values by that value.
  2. If the non_negative arg is set to True (which is the default), there is the additional check later in the function that can handle all remaining negavtive values.

This comment has been minimized.

@arokem

arokem Jun 23, 2016

Member

Yes - I agree with everything you say:I just think that the comment is inaccurate, considering the code below.

OTOH, that's not crucial. Apart from the line-wrapping (which is really my bad...), I think this is ready to go.

This comment has been minimized.

@jchoude

jchoude Jun 23, 2016

Contributor

I pushed the line-wrapping fix, except if you,re talking about some other line...

Also, maybe I misunderstood... What part of the comment is inaccurate? That we apply only to non_zero voxels? Sorry I'm just not able to pinpoint directly which part is inaccurate.

This comment has been minimized.

@arokem

arokem Jun 23, 2016

Member

Yes - somehow the changes I was looking at didn't have your PEP8 commit. Thanks for doing that.

Non-zero would be:

log_ap[ap != 0] = np.log(ap[ap != 0]) - np.log(norm_factor)

As it is, the code selects for non-negative. But really - this is not crucial - as you pointed up above.

This comment has been minimized.

@jchoude

jchoude Jun 23, 2016

Contributor

Ah! I get it. Well, it is true that the condition is non_zero. It is impossible to have values of ap that are negative, and we only want to avoid 0s, since those are the ones that throw the log warning.

I agree that with small ap values between 0 and 1, the log will yield a negative value. As I said, it will be handled by the rest of the statement and later code.

I'll push a quick fix to the comment.

# to avoid getting numpy warnings on log(0).
# Also avoids getting voxels with -inf when non_negative=False.
log_ap = np.zeros_like(ap)
log_ap[ap > 0] = np.log(ap[ap > 0]) - np.log(norm_factor)

This comment has been minimized.

@MarcCote

MarcCote Jun 22, 2016

Contributor

This is more for my intellectual curiosity, does anyone knows if there is any substantial speedup of caching this boolean indexing (i.e. ap > 0) or does Python will optimize it anyway?

This comment has been minimized.

@arokem

arokem Jun 22, 2016

Member

I don't know whether Python has some optimization in place, but there seems to be only a small advantage to caching:
https://gist.github.com/arokem/fba319fedf2b541240504f56f97af45e

This comment has been minimized.

@jchoude

jchoude Jun 23, 2016

Contributor

I had made quick tests to see if it was worth it, and I also didn't see a significant improvement. However, I'm no expert on the internals of Numpy or python, so there might be some magic caching behind the scenes, or an optimized accessor.

This comment has been minimized.

@MarcCote

MarcCote Jun 23, 2016

Contributor

Ok, good to know, thanks.

@coveralls

This comment has been minimized.

coveralls commented Jun 23, 2016

Coverage Status

Coverage decreased (-0.4%) to 82.441% when pulling 079d01f on jchoude:ENH_avoid_log_zero into 82573c7 on nipy:master.

@arokem

This comment has been minimized.

Member

arokem commented Jun 23, 2016

Hmm. Travis is not happy: https://travis-ci.org/nipy/dipy/jobs/139731117#L2267

This looks suspiciously similar to #1074, but that might just be a red herring. Seems to be happening only with older numpy.

@jchoude

This comment has been minimized.

Contributor

jchoude commented Jun 23, 2016

Yes, that's weird for Travis. I manually ran the test, didn't have this issue. I'm not sure how to see the version of numpy that is used... I see that, at the beginning of the log, it says that it depends on Numpy 1.7.1. Does it mean it is that version that is used?

@coveralls

This comment has been minimized.

coveralls commented Jun 23, 2016

Coverage Status

Coverage decreased (-0.4%) to 82.441% when pulling 777dfa2 on jchoude:ENH_avoid_log_zero into 82573c7 on nipy:master.

@arokem

This comment has been minimized.

Member

arokem commented Jun 23, 2016

Yes, numpy 1.7.1

As you can see here: https://travis-ci.org/nipy/dipy/builds/139731111, only that machine fails. This is a machine set up to support the minimal versions of dependencies that we support. It looks to me that the 0d array arises, because we are dereferencing the array in the test:

https://github.com/nipy/dipy/blob/master/dipy/reconst/tests/test_shm.py#L440

@arokem

This comment has been minimized.

Member

arokem commented Sep 29, 2016

Any more thoughts here? Looks like we're still stuck on that test failure.

@arokem

This comment has been minimized.

Member

arokem commented Oct 21, 2016

Yo @jchoude : have you had a chance to look at this one again?

@jchoude

This comment has been minimized.

Contributor

jchoude commented Oct 26, 2016

@arokem I hadn't until now. I pushed a potential fix, waiting on travis to do its magic.

Then, I'll rebase.

@coveralls

This comment has been minimized.

coveralls commented Oct 26, 2016

Coverage Status

Coverage decreased (-0.4%) to 82.479% when pulling 6d9916a on jchoude:ENH_avoid_log_zero into 82573c7 on nipy:master.

@jchoude

This comment has been minimized.

Contributor

jchoude commented Oct 26, 2016

@arokem so, after more investigation, even using a np.asarray() in the test doesn't work for this. The thing is that we want to test if it works in the single voxel case, and therefore send a 1D array to the anisotropic_power() function. In there, we derive the shape of the AP array, and then index in there.

The issue is that, at least with numpy 1.10.1, indexing using code like ap[ap > 0] works even if ap is a single scalar. There is probably an internal numpy cast as an array. For numpy 1.7, this doesn't work, hence the error that we see.

In this case, what would be your preferred course of action?

  • One basic choice, but probably will be considered not possible, is to support numpy 1.9.2 and up only, since the issue is fixed in 1.9.2.
  • Add a special code path in anisotropic_power to correctly behave in the face of a single scalar value.

What would you prefer? Any other idea?

@arokem

This comment has been minimized.

Member

arokem commented Dec 16, 2016

Hi @jchoude : apologies for the long gap here. I'd prefer to special-case it in the code. It's a bit of a weird edge-case, but since we're testing it, I think that's probably fine.

@arokem arokem referenced this pull request Mar 23, 2017

Merged

ENH: avoid log zero #1198

@jchoude

This comment has been minimized.

Contributor

jchoude commented Mar 29, 2017

Closing since it was superseded by pull request #1198

@jchoude jchoude closed this Mar 29, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment