Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

special: handle the edge case lambda=0 in pdtr, pdtrc and pdtrik #3155

Merged
merged 3 commits into from
Feb 19, 2014

Conversation

WarrenWeckesser
Copy link
Member

This PR makes a few changes so that special.pdtr, special.pdtrc and special.pdtrik give the correct limiting value when the Poisson parameter lambda is 0.

In the case of pdtrik, the function now immediately returns 0 for lambda < 0.01 and p < 0.975. This is a lot more than the simple edge case lambda = 0, but it is based on the following figure showing the results of computing pdtrik from master on a grid of values:

pdtrik_plot

The gray dots are where the result is 0. The red dots are where pdtrik returns 1e+100; the "correct" value is 0 at those points.

The script to generate the plot is in a gist: https://gist.github.com/WarrenWeckesser/7995139

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling bf715ac on WarrenWeckesser:special-poisson-edge-cases into 34ae412 on scipy:master.

@WarrenWeckesser
Copy link
Member Author

I haven't been able to reproduce the failure that is occurring in travis with python 2.7. With the changes in the PR, cephes.pdtrik([[0], [0.25], [0.95]], [0, 1e-20, 1e-6]) should return all zeros, but on travis, it returns all nans. Anyone have a suggestion for how to track this down?

@rgommers
Copy link
Member

I can reproduce this on 32-bit linux, will have a look.

IF ((xlam .LT. 1.0D-2) .AND. (p .LT. 0.975D0)) THEN
C For sufficiently small xlam and p, the result is 0.0.
s = 0.0D0
stats = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be status = 0, that should fix the failure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I must have looked at that line dozens of times and never noticed the typo.

Adding IMPLICIT NONE would have caught it. Apparently IMPLICIT NONE is not standard Fortran 77, but we're already using it in special/specfun/specfun.f. Any objections to adding it to this subroutine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No opinion, I don't really speak Fortran.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless implicit none breaks in some of our more esoteric fortan 77 setups, please please please add it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally am more wary of Fortran code without implicit none --- standard or not, as long as it actually compiles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgommers
Copy link
Member

Still strange that the failure depends on Python version (or even the way I call it; within IPython it doesn't fail). So something else may still be wrong.

@WarrenWeckesser
Copy link
Member Author

@rgommers wrote:

Still strange that the failure depends on Python version (or even the way I call it; within IPython it doesn't fail). So something else may still be wrong.

Because of the typo, the variable was never assigned in the Fortran code. The variable was originally declared on the stack in the C wrapper, so it was uninitialized. If the value happened to be negative, 3 or 4, the wrapper function returned nan. So it is not surprising that it would fail in random ways. (Of course, I may be overconfident here--the tests on Travis for the updated code haven't finished yet. :)

@WarrenWeckesser
Copy link
Member Author

We already use "implicit none" quite a bit (search for "implicit none"), so apparently it is not a problem, even though it is not standard Fortran 77. I added it to the cdfpoi subroutine in the fixed version of the PR.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling f2d32d0 on WarrenWeckesser:special-poisson-edge-cases into 34ae412 on scipy:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 6e61391 on WarrenWeckesser:special-poisson-edge-cases into 34ae412 on scipy:master.

@WarrenWeckesser
Copy link
Member Author

All the tests pass now. Yesterday the Python 3.3 build had a timeout error.

@WarrenWeckesser
Copy link
Member Author

@pv: As the resident special specialist, could you take a look at this when you get a chance?

@ev-br
Copy link
Member

ev-br commented Feb 13, 2014

A weird comment very very late: do we actually need to have pdtrik returning a float? The only use of it I can find seems to be in https://github.com/scipy/scipy/blob/master/scipy/stats/_discrete_distns.py#L446
where it's being cast to a whole number anyway.

@josef-pkt
Copy link
Member

do we actually need to have pdtrik returning a float?
I know that the Poisson log-likelihood is popular also for estimating non-negative continuous data.
But I don't see what a continuous extension of pdtrik should mean.

In some cases it's useful to have a continuous extension for calculations, even if we convert to integers at the end. Maybe it would be easy to generate Poisson random numbers with a continuous pdtrik return.

pv added a commit that referenced this pull request Feb 19, 2014
BUG: special: handle the edge case lambda=0 in pdtr, pdtrc and pdtrik
@pv pv merged commit 32cd96d into scipy:master Feb 19, 2014
@pv
Copy link
Member

pv commented Feb 19, 2014

LGTM. Some checking shows that pdtr is a bit buggy for large lambda, however (underflows):

In [75]: p = np.r_[0, np.logspace(-300, 0, 2000)]
In [76]: m = np.r_[0, np.logspace(-300, 5, 2000)]
In [77]: v = (pdtr(np.ceil(pdtrik(p[:,None], m)), m) < p[:,None])
In [78]: np.where(v)
(array([  1,   1,   1, ..., 927, 929, 929]),
 array([1987, 1988, 1989, ..., 1995, 1997, 1998]))
In [80]: k = 123; pdtr(np.ceil(pdtrik(p[i[k]], m[j[k]])), m[j[k]]), p[i[k]], m[j[k]], np.ceil(pdtrik(p[i[k]], m[j[k]]))
(0.0, 1.5870854134187277e-299, 17263.090976752374, 9771.0)

@WarrenWeckesser WarrenWeckesser deleted the special-poisson-edge-cases branch June 4, 2014 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants