Rename information theory functions? #3421

Open
jseabold opened this Issue Mar 2, 2014 · 12 comments

Comments

Projects
None yet
6 participants
Contributor

jseabold commented Mar 2, 2014

I don't have a concrete proposal here, but this was brought up in #3213 so I thought I'd get the ball rolling. Right now entropy computes the entropy and the KL divergence, which might not be expected. I'd be in favor of deprecating the q argument of entropy. And instead having, for example, entropy, cross_entropy, kl_divergence, etc. functions. Should be fairly straightforward if we agree on a refactor/renaming. Thoughts?

Owner

rgommers commented Mar 2, 2014

That is a concrete proposal:) kl_divergence makes sense. cross_entropy would be new right (if trivial to implement)?

Would be good to define what's in scope and out of scope here. One could add also joint and conditional entropy, and a bunch more entropy / information measures. I don't think we necessarily want one function for each metric; the stats namespace is pretty crowded already.

Contributor

jseabold commented Mar 2, 2014

Well, it's concrete in that it exists, but not that I'm set on it.

Yes, cross entropy would be new (and trivial to implement). Thinking about the comments in #3213 again, I'm actually not sure I agree that I'd expect q to give cross entropy instead of relative entropy, since you need relative entropy (KL divergence) to compute cross entropy.

I agree about the namespace, though I don't have a good solution. Maybe easiest is to do nothing... Flat is better than nested, but I like things in their place. I don't know if this might grow into its own sub-package. Given lack of commits in this direction and the existence of other packages, maybe not. Though the other packages tend to be domain specific last I looked. So...maybe stats.info_theory.<TAB>? Probably not a good solution.

Contributor

jseabold commented Mar 2, 2014

I started this long ago as I was first studying these things. Probably would need more work with sanity and correctness checking, but it's the kind of basic tools that might be nice.

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/infotheo.py

Owner

rgommers commented Mar 2, 2014

stats.info_theory doesn't look too good. I'd say that if there are 4-7 things we want in scipy and the rest is left for statsmodels, pyEntropy (https://github.com/robince/pyentropy) or other packages, then just adding it to the stats namespace should be OK. All it needs is a comprehensive proposal of what should be in scipy based on how widely applicable the metric is; adding functions one by one doesn't work so well.

Member

ev-br commented Mar 2, 2014

can this be turned into a doc issue: leave the workhorse entropy (fix if needed), and write an info_theory bit in the tutorial / example notebook with one-liners in it

Contributor

jseabold commented Mar 2, 2014

I think that's reasonable. There still might be some room for more functions, but we can cross that bridge when we come to it.

Contributor

argriffing commented Mar 2, 2014

If we are voting, I'd vote for adding a few of http://dcp.stanford.edu/functions specifically entr, kl_div, and rel_entr as cryptically named scipy.special ufuncs, and then add the reductions entropy, kl_divergence on top of these as stats functions.

The xlogy ufunc covers most special cases, but it doesn't know that x - log(x) should be +inf (rather than nan or -inf) when x is +inf, which is needed in some definitions of kl divergence. Also the functions can be extended beyond their usual domains in various ways, some of which are more useful than others. For example some extensions let some functions be convex whereas other extensions might not guarantee this property.

One problem is that there is not consensus about the meanings of some function names. Do relative entropy and K-L divergence really mean the same thing? Also what should we do about un-normalized distributions? Should we automatically normalize them? Should we return an error? Should they be dealt with in some other way? For asymmetric divergences between finite distributions, should one of the distributions be optional? If it is not provided should it be assumed to be the uniform distribution? Should automatic symmetrization of asymmetric divergences be used? If so, should spatial.distance-like condensed distances be returned? How should broadcasting work anyway?

Owner

pv commented Sep 3, 2014

entr, kl_div and rel_entr were added as cryptic scipy.special funcs

Contributor

argriffing commented Sep 3, 2014

Great! They may need some help with corner cases so that they still have useful properties (e.g. convexity) on more generous domains. For example entr(-1) currently returns nan but should possibly be extended to return -inf. I hope the values that are currently returned for inputs outside of the usual domain are considered provisional as opposed to being frozen for backwards compatibility.

Contributor

argriffing commented Sep 3, 2014

I found a reference http://cvxr.com/cvx/doc/dcp.html for the most convenient extension of these functions.

Following standard practice in convex analysis, convex functions are interpreted as +∞ when the argument is outside the domain of the function, and concave functions are interpreted as −∞ when the argument is outside its domain. In other words, convex and concave functions in CVX are interpreted as their extended-valued extensions.

Contributor

argriffing commented Sep 3, 2014

More clearly, I'd hope to improve the extensions of these functions so that special return values (-inf, +inf, nan) have useful semantics. I think they can be arranged so that if you attempt to compose functions according to convex programming rules then nan implies that the rules were violated, -inf for a correctly composed function means that constraints of a concave function are violated, and +inf means that constraints of a convex function are violated. For these particular special functions I think these extensions could be more natural than using other extensions, for example an extension to the complex plane.

I'm sure the author of the information theory functions had his reasons, but find the behaviour of these functions very unexpected and strange.

  • Entropies and relative entropies should return a number in bits or nats, not an array.
  • Relative Entropy and Kullback-Leibler divergence refer to the same mathematical expression.

Compare e.g.
Cover, Thomas - Elements of information theory, section 2.3
www.renyi.hu/~csiszar/Publications/Information_Theory_and_Statistics:_A_Tutorial.pdf (see Abstract and later definition)
http://mathworld.wolfram.com/RelativeEntropy.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment