New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exponentially modified gaussian distribution (scipy.stats.expongauss) #4533
Conversation
Looks good to me, if the framework handles the testing. I guess |
Ah hah! I was wondering how to do that. Will add. Loc is indeed mu. Unfortunately, there doesn't seem to be a combination of sigma and lambda that can act as scale. |
I don't think this is being tested at all. |
|
||
The two shape parameters for `expongauss` (``lam`` and ``s``) must | ||
be set explicitly. | ||
.. versionadded:: 0.16.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this needs a blank line above .. versionadded::
@ev-br -- adding to _distr_params: wow, that's well documented... will add -- as well as a note to test_continuous_basic that you should add parameters there. |
@@ -139,6 +139,7 @@ Brandon Liu for stats.combine_pvalues. | |||
Clark Fitzgerald for namedtuple outputs in scipy.stats. | |||
Florian Wilhelm for usage of RandomState in scipy.stats distributions. | |||
Robert T. McGibbon for Levinson-Durbin Toeplitz solver. | |||
Alex Conley for the Exponentiall Modified Gaussian distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exponentiall
@aconley Yeah, it's not too well documented :-(. Basically, there is a loop over the distributions in Parameters with which to test are pulled from The reason why parameters are separated is that they are also used to construct the default docstring example :-). If you have an idea of how to document this, please do :-) |
I'd say LGTM once this is added to the main test loop over the distributions and these tests are green. If there are any additional tests, please add them to |
|
adding explicit |
also explicit |
extra task: it would be good to add new distributions to the list in the tutorial, although, IIRC we haven't added previous additions. |
Should all hopefully be addressed. Turning on the unit tests revealed some overflow behavior which is now fixed. However, it does still print an overflow warning, even though the result is fine. Is it worth using numpy.seterr to turn off the overflow warning just inside the pdf function? Or is that unwanted/unneeded overhead? |
Please, no seterr in the library code. (My personal opinion, different ones exist). Have you tried 1) moving computations to log space, with |
|
Alas, erfcx doesn't help. Or, rather, it fixes the problem for large negative x but introduces a new problem for large positive x (since erfc(-1000) is 2, but erfcx(-1000) overflows). |
Oh, and the log approach causes a different set of warnings (divide by zero), although it again produces valid results. |
maybe you need to separate the lower tail and the upper tail by branching the calculations, based on your comment on erfcx versus erfc. I didn't check the details: Is there some symmetry in the distribution between small and large values that could be exploited? |
Not that I can see. Splitting the lower/upper tail would work -- but it's actually even simpler if I go that route. Just check for the argument to the exponential being less than, say, -700 and return zero if it is (since you're going to get that anyways!). I was trying to find a way to avoid that ugliness, but maybe numpy.vectorize is the only way... |
don't use numpy.vectorize, you already have broadcasted arguments, so it should be just a conditional assignment. In other cases, like 0 * log(0) we switched to helper functions. @ev-br and others are more up to date on how this is handled now. |
IIRC, Evgeni added a |
Re lazywhere --- I only found myself copy-pasting a pre-existing piece of code, and refactored into a function :-). The code itself was there before me, so is it you Josef, or Per or Travis? |
The test failure for the "full" test suite seems real: the 4th moment from Re far tails and warnings. We already filter out warnings in the test suite, we can filter out this one too --- in the test suite, not the library code. We already do calculations in log space in several cases, there was a discussion about this a while ago (a PR from @WarrenWeckesser IIRC). We also have several cases where calculations fail in the "wrong" tail, so here we might just live with it. Meanwhile I wonder if we could/should select the tails in the level of the generic |
Here: #3510 |
Not that I want to be a pain in everybody's neck, but I do have a sort of a general comment. This PR defines a two-parametric distribution with shape parameters Unless I'm wrong somewhere, this distribution can indeed be written in a standartized form with a single shape parameter. The relation to the form from this PR would be then (with
Here: https://dl.dropboxusercontent.com/u/18720218/expon_modified_normal.pdf I think it's better to standartize this distribution. (And only then deal with numerical issues at the tails, if any are left). Thoughts? |
Wouldn’t it be a bit cleaner to define loc = mu + lam * s**2 Then the distribution is xi / (2*scale) exp(-(xi * x + xi^2/2)) erfc(-x / sqrt(2)) That has the advantage that the ‘canonical’ values s = 1 lam = 1 map onto The default loc=0 is a little weird though. |
Ok, even better (?) loc = mu Then the canonical version (sum of a centered gaussian with unit variance and a exponential with scale 1) has loc=0, scale=1, shape=1. |
+1 for the last parametrization. |
|
# Avoid overflows; setting exp(exparg) to the max float works | ||
# all right here | ||
expval = _lazywhere_single(exparg < _maxexparg, (exparg), | ||
exp, _maxfloat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_lazywhere should work with a single argument: just feed it a single-element tuple. See, for example, invgamma.stats method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, silly me, I forgot the trailing , in (exparg,)
I made one more round of comments, mostly rather minor. Two issues remaining:
If push comes to shove, you may just mark it as a known failure, see One other thing which might be helpful is to spell out in the docstring the transformation from the 'wikipedia' definition --- and/or other definitions used in the literature to the standartized one here. |
To this point: I haven't thought about how this would be implemented, but it would be good to start helper methods to convert the parameterization to something more "standard" in some cases. It would be just a distribution specific method without any generic support, I guess. |
@josef-pkt Would a docstring example with an explicit formula not be sufficient? |
It's sufficient for now, but I think eventually we should support it automatically, with code. |
Let's take the discussion of parametrization to #4538 to avoid side-tracking this PR. |
You'll like this: it is calculating the excess kurtosis, using the formula from Wikipedia. The problem is that the Ex Kurtosis formula on Wikipedia is wrong -- it should have a 3 in front of it, not a 2 (and it simplifies much further with that change). This also fixes the test. So, the unit tests in scipy have uncovered a wikipedia mistake. I need to find a reference for this somewhere so I can fix wikipedia -- they probably won't like my explanation that you can just derive this from the characteristic function. Edit: fixed on Wikipedia; their original source had the 3, so it was apparently a transcription error. |
Oh, great :-). Two last nitpicks:
In any case, LGTM. |
09f0831
to
700879c
Compare
Trivial tweaks in the docs ( |
…nnorm) * Added to release notes for 0.16.0 * Example PDFs and CDFs checked against the wikipedia page (http://en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution), random samples tested against mean, variance, histograms compared for various parameters. A few simple unit tests added against non loc/scale/shape parameterization statistics. * RVS checked against PDFs for a range of parameters.
Ok, done. |
It is kind of impressive to me that the scipy stats tests detected the excess kurtosis error on Wikipedia. |
I think I'll merge it soon-ish unless further comments. |
No more comments, merging. Thanks @aconley ! |
Add exponentially modified gaussian distribution (scipy.stats.expongauss)
already existing
in scipy.stats, so nothing added there.
(http://en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution),
random samples tested against mean, variance, histograms compared
for various parameters.