Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle the limit of p = 0 in p log2 p #3

Merged
merged 3 commits into from Jun 17, 2021

Conversation

jftsang
Copy link
Contributor

@jftsang jftsang commented Jun 14, 2021

This patch defines a helper function, _xlog2x(x), that calculates
x * log2(x) but handles the case x == 0 by returning 0 rather than nan.
This is needed if the power spectrum has any component that is exactly
zero: in particular, if the f = 0 component is zero.

This patch defines a helper function, _xlog2x(x), that calculates
x * log2(x) but handles the case x == 0 by returning 0 rather than nan.
This is needed if the power spectrum has any component that is exactly
zero: in particular, if the f = 0 component is zero.
@raphaelvallat raphaelvallat self-requested a review June 15, 2021 21:40
@raphaelvallat raphaelvallat self-assigned this Jun 15, 2021
@raphaelvallat
Copy link
Owner

Hi @jftsang,

Thanks for the PR! A few questions:

  1. Do you have any scientific reference (or any kind of documentation) for why this should be the preferred behavior?

  2. Should this modification only affect the spectral entropy function?

  3. Could you explain what the @np.vectorize decorator is for?

Thanks,
Raphael

@jftsang
Copy link
Contributor Author

jftsang commented Jun 16, 2021

Hi @raphaelvallat,

  1. The limit of x log x as x tends to 0 is 0; it follows from l'Hopital's rule. Try https://math.stackexchange.com/questions/470952/limit-of-x-log-x-as-x-tends-to-0, and see this illustration: https://www.wolframalpha.com/input/?i=limit+of+x*log%28x%29+as+x+-%3E+0. I'll try and find a proper academic reference when I get home.
  2. I don't know much about the other entropies but I think this result applies whenever a p log p appears, so that your entropy is zero and not undefined.
  3. The @np.vectorize decorator allows you to apply the function to a numpy array rather than a single number. We need this because of the if x == 0. Without the decorator, 'the truth value of an array with more than one element is ambiguous'.

Cheers,
Joanna

@raphaelvallat
Copy link
Owner

This is all perfect, thanks! One last thing before I merge: can you add your changes to the docs/changelog.rst file (with link to the current PR and if desired your GitHub username)? You'll need to start a new version of antropy, i.e. v0.1.5

Cheers,
Raphael

@jftsang
Copy link
Contributor Author

jftsang commented Jun 17, 2021

Done! I've also added a couple of unit tests.

~JMFT

@raphaelvallat
Copy link
Owner

Merging now, thanks again for the PR!

@raphaelvallat raphaelvallat merged commit 4ba03dc into raphaelvallat:master Jun 17, 2021
@jftsang
Copy link
Contributor Author

jftsang commented Jun 24, 2021

Having tested this on a very large file, I have just realised that this _xlog2x function is significantly slower, I suspect because of the conditional test. I shall experiment with using np.nan_to_num instead, which I suspect will be much faster. Sorry for the inconvenience!

jftsang added a commit to jftsang/antropy that referenced this pull request Jun 24, 2021
Follow up to raphaelvallat#3

Using np.nan_to_num is advantageous because it makes use of numpy's
vectorization, instead of 'if x == 0', which applies the test pointwise.
jftsang added a commit to jftsang/antropy that referenced this pull request Jun 24, 2021
Follow up to raphaelvallat#3.

Using `np.where` is advantageous because it makes use of numpy's
vectorization, instead of `if x == 0`, which applies the test pointwise.
Using `@jit(nopython=True)` is also advantageous.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants