Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Feldman Cousins algorithm #367

Merged
merged 31 commits into from Oct 21, 2015
Merged

Add Feldman Cousins algorithm #367

merged 31 commits into from Oct 21, 2015

Conversation

@dlennarz
Copy link
Contributor

@dlennarz dlennarz commented Oct 11, 2015

Adding Feldman Cousins to gammapy.

Status: Done working on it, unless there are new comments / requests.

There was a previous pull request (#311), but I messed up that branch when trying to merge changes from the master. That's why I deleted that branch and recreated it.

@cdeil cdeil added the feature label Oct 11, 2015
@cdeil cdeil added this to the 0.5 milestone Oct 11, 2015
@cdeil cdeil self-assigned this Oct 11, 2015
Dirk Lennarz added 4 commits Oct 11, 2015
Dirk Lennarz
…bers from the FC paper. Extend coverage, while trying to cut down on the computation time. Rename functions to be more logical. Impove function documentation. Fix bugs in previously untested classes.
Dirk Lennarz
@dlennarz
Copy link
Contributor Author

@dlennarz dlennarz commented Oct 14, 2015

I'm done working on it, unless there are new comments / requests.

@cdeil cdeil modified the milestones: 0.4, 0.5 Oct 15, 2015
@cdeil
Copy link
Member

@cdeil cdeil commented Oct 15, 2015

@dlennarz – Would you mind making your code PEP8 compliant?
E.g. for gammapy/stats/feldman_cousins.py the pep8 tool points out this.

Also, if you have the patience to go though and remove the leading f from variable names everywhere, that's a remnant from C++ class members (now we're at Python, no classes), that would be great.

For example this:

fBackground  = 3.5
fNBinsX      = 100
fCL          = 0.90

should be this

background = 3.5
n_bins_x = 100
cl = 0.90

If you don't have time for such nitpicky stuff, just let me know and I'll just do it after merging this PR.

@@ -0,0 +1,20 @@
"""Demonstrate the artefact that can arise if

This comment has been minimized.

@cdeil

cdeil Oct 15, 2015
Member

I'm not sure it's really better, but maybe put the results in a table to get a nicer printout?

"""
Demonstrate the artifact that can arise if
fc_fix_upper_and_lower_limit is not used.
"""
import numpy as np
from astropy.table import Table
from gammapy.stats import fc_find_acceptance_region_poisson

background = 3.5
cl = 0.90
x_bins = np.arange(0, 100)

table = Table()
table['mu'] = [0.745, 0.750, 0.755, 1.030, 1.035, 1.040, 1.045, 1.050, 1.055, 1.060, 1.065]
table['x_min'] = 0.0
table['x_max'] = 0.0

for row in table:
    x_min, x_max = fc_find_acceptance_region_poisson(row['mu'], background, x_bins, cl)
    row['x_min'] = x_min
    row['x_max'] = x_max

table.pprint()

gives the output:

  mu  x_min x_max
----- ----- -----
0.745   0.0   8.0
 0.75   1.0   8.0
0.755   1.0   8.0
 1.03   1.0   8.0
1.035   0.0   8.0
 1.04   0.0   8.0
1.045   0.0   8.0
 1.05   0.0   8.0
1.055   0.0   8.0
 1.06   1.0   9.0
1.065   1.0   9.0

This comment has been minimized.

@dlennarz

dlennarz Oct 15, 2015
Author Contributor

I like it. Changed the example.

@cdeil
Copy link
Member

@cdeil cdeil commented Oct 15, 2015

One more comment on documentation, and then I'm done with the review here.

It would be nice if you could add an example at the end of the function docstrings.
(see here as an example what I mean and here for the Numpy docstring syntax to do it.
It might also be useful to link from the docstrings to the high-level docs page you've written for Feldman-Cousins. Then it's all cross-linked and users can go back and forth between high-level docs, API docs and looking at the code if they like.

For the high-level docs page you've written, I also think it would be great to give at least one or two code examples directly on the page, with output. And for people like me that don't have the time to go read the FC paper, it would also be helpful to give some more absolute beginner info than "The first plot reproduces Fig. 7 from [Feldman1998](Poisson process with background 3.0).", i.e. something like: "Assume you have an observation corresponding to a Poisson process and observed 7 events and estimated a background of 3.0. Then using the FC method you can compute a confidence interval of (XXX, YYY) on the excess.".
I.e. my point is: it would be great if gamma-ray astronomers get some description and an example how to compute FC limits for the two or three common cases directly, without having to go to the FC paper to understand what's going on.

[1,8]. A lot of the fast algorithms that do not compute the full confidence belt
will come to the conclusion that the 90% confidence interval is [0, 0.745] and
thus the upper limit when zero is measured should be 0.745 (one example is
TFeldmanCousins that comes with `ROOT`, but is has the additional bug of making

This comment has been minimized.

@cdeil

cdeil Oct 15, 2015
Member

Put TFeldmanCousins and ROOT in double backticks.
(for single backticks Sphinx tries to generate links)

This comment has been minimized.

@dlennarz

dlennarz Oct 16, 2015
Author Contributor

Fixed

@@ -79,6 +79,14 @@ with the Li \& Ma formula:
TODO: More examples.

This comment has been minimized.

@cdeil

cdeil Oct 15, 2015
Member

Add one FC code example (just call one of the functions and show the output) on the top-level page?
(to make some "PR", i.e. make it easier for people to discover it / get started with it)

This comment has been minimized.

@dlennarz

dlennarz Oct 20, 2015
Author Contributor

Example added.

… of some functions. Simplify function names. Some functions now raise ValueErrors in case of bad input. Expand the tests to increase coverage. Adjust variable names (removing trailing f). Improved function documentation. Expand documentation with code examples. Add example to gammapy.stats index page. Progress bar for the two analytical examples. Nicer print out for fc_demonstrate_artefact.py example.
@dlennarz
Copy link
Contributor Author

@dlennarz dlennarz commented Oct 20, 2015

Done working on the comments / requests.

Coverage is 99% now and cannot be improved, because the two missing lines should never be called actually (it's still good to check for zero division).

I expanded the documentation with code examples and improved the function documentation (linking to the high-level docs page).

@cdeil
Copy link
Member

@cdeil cdeil commented Oct 21, 2015

@dlennarz – Thanks!!!

I'm not quite sure why this error occurs:
https://travis-ci.org/gammapy/gammapy/jobs/86494723#L1071
Can you please rebasing this branch on master (see git commands here) ... maybe this will resolve the issue.


# Pass too few x_bins to reach confidence level.
x_bins = np.linspace(-sigma, sigma, n_step, endpoint=True)
assert_raises(ValueError, fc_find_acceptance_region_gauss, 0, 1, x_bins, cl)

This comment has been minimized.

@cdeil

cdeil Oct 21, 2015
Member

This currently fails in one of the travis-ci builds:
https://travis-ci.org/gammapy/gammapy/jobs/86494765#L1507

Instead of using numpy.testing.assert_raises, please use pytest.raises (see here):

with pytest.raises(ValueError):
    fc_find_acceptance_region_gauss(0, 1, x_bins, cl)

You can also assert things about the exception (see example), but I don't think that's needed here.

This comment has been minimized.

@dlennarz

dlennarz Oct 21, 2015
Author Contributor

Okay, done.

paper."""
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

This comment has been minimized.

@cdeil

cdeil Oct 21, 2015
Member

Unuses import from scipy import stats. Remove.

This comment has been minimized.

@dlennarz

dlennarz Oct 21, 2015
Author Contributor

Removed.

paper."""
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

This comment has been minimized.

@cdeil

cdeil Oct 21, 2015
Member

Remove unused import from scipy import stats

This comment has been minimized.

@dlennarz

dlennarz Oct 21, 2015
Author Contributor

Removed.

LowerLimitAna = []

for mu in ProgressBar(mu_bins):
goodChoice = fc_find_acceptance_region_poisson(mu, background, x_bins, fCL)

This comment has been minimized.

@cdeil

cdeil Oct 21, 2015
Member

fCL -> cl

This comment has been minimized.

@dlennarz

dlennarz Oct 21, 2015
Author Contributor

Fixed.


Assume you measured 6 counts in a Poissonian counting experiment with an
expected background :math:`b = 3`. Here's how you compute the 90% upper limit
on the signal strength :math:`\\mu`:

This comment has been minimized.

@cdeil

cdeil Oct 21, 2015
Member

Isn't this the use case covered by the fc_find_acceptance_regino_poisson convenience function?
If yes, I'd suggest using that simpler version on this top-level gammapy.stats docs page, and moving the (equivalent?) code that takes several calls to more general fc* functions to the FC docs page. Or keep it here in addition, but also show how to call the simpler fc_find_acceptance_region_poisson function here, please.

This comment has been minimized.

@dlennarz

dlennarz Oct 21, 2015
Author Contributor

You are mistaken about what gammapy.stats.fc_find_acceptance_region_poisson does. For a given background and mu it returns the acceptance interval (x_min, x_max) such that the integrated probability is the confidence level.

The case covered here is you measured a certain x and now you are looking for the corresponding confidence interval (mu_1, mu_2).

There is a difference between acceptance and confidence interval. Normally the user is interested in the confidence interval.

This comment has been minimized.

@cdeil

cdeil Oct 21, 2015
Member

Is it possible to add a "FC for dummies" functions for the common cases?
Something like this:

excess_lo, excess_hi = fc_excess_limits_poisson(n_on, mu_background, cl)
excess_lo, excess_hi = fc_excess_limits_poisson_on_off(n_on, n_off, alpha, cl)

I guess it's still a few lines to implement based on what's here now, and it's non-trivial to implement because some extra parameters like acceptance interval range and spacing has to be chosen so that it's reasonable efficient and accurate?

If you don't have time any more to add this, just let me know, and I'll try to code it up tonight.

Concerning the functions that compute acceptance intervals, it could be useful to add a sentence like:
"Acceptance intervals are used in the Feldman-Cousins computation of confidence intervals. Usually the confidence intervals are the result of interest."

This comment has been minimized.

@dlennarz

dlennarz Oct 21, 2015
Author Contributor

The function fc_excess_limits_poisson basically combines:
fc_construct_acceptance_intervals_pdfs
fc_get_limits
fc_fix_limits
fc_find_limit

I think it's a bad idea to combine them into a function, because a) that function will be slow and b) we will have to choose a mu and x binning for the user. It would be much better to make this a class, so LowerLimit and UpperLimit aren't constantly recomputed when the function is called.

Things can always be improved.


def fc_find_limit(x_value, x_values, y_values):
r"""
Find the limit for a given x measurment

This comment has been minimized.

@cdeil

cdeil Oct 21, 2015
Member

Typo: measurment -> measurement

This comment has been minimized.

@dlennarz

dlennarz Oct 21, 2015
Author Contributor

Fixed.

This can be easily converted to upper and lower limits (`~gammapy.stats.fc_get_limits`),
which simply connect the outside 1s for different :math:`\\mu`s. An upper or lower limit
is obtained by drawing a vertical line at the measured value and calculating the
intersection (`~gammapy.stats.fc_find_limit`).

This comment has been minimized.

@cdeil

cdeil Oct 21, 2015
Member

Here Sphinx gets confused and doesn't render the HTML output correctly (see here.

I think (but didn't verify) that the issue is the s you have directly following the math directive:

:math:`\\mu`s

Can you try putting this instead

:math:`\\mu` values

and then see if the link to

`~gammapy.stats.fc_find_limit`

resolves correctly?

This comment has been minimized.

@dlennarz

dlennarz Oct 21, 2015
Author Contributor

It seems to have fixed the issues.

import logging
log = logging.getLogger(__name__)

__all__ = ['fc_find_acceptance_region_gauss',

This comment has been minimized.

@cdeil

cdeil Oct 21, 2015
Member

Feel free to ignore me here, but just as a question / suggestion: is it possible to make the function names shorter / simpler / more uniform without being cryptic?

One possible improvement would be to always use "interval" instead of sometimes "region".
FC, at least as implemented here and commonly used, only works for one parameter, so "interval" is a bit more clear, no?

And maybe fc_get_limits_poisson would be a bit better than fc_find_acceptance_interval_poisson, because it's shorter and it's a special case of the more general fc_get_limits function? (I didn't look closely if this is actually the case, if this is incorrect and you think the current function names are best, just ignore me here.)

This comment has been minimized.

@dlennarz

dlennarz Oct 21, 2015
Author Contributor

I replaced all the "acceptance_region" with "acceptance_interval" (which is also what is used in the FC paper.

I already explained this in an earlier comment: fc_find_acceptance_interval_poisson does not give you an confidence interval, but an acceptance_interval. So I think the function names are fine.

@cdeil
Copy link
Member

@cdeil cdeil commented Oct 21, 2015

@dlennarz Thanks for your patience with the review on this PR. Let me know when you've pushed the latest changes and I'll merge this.

@dlennarz
Copy link
Contributor Author

@dlennarz dlennarz commented Oct 21, 2015

Done working on the comments / requests.

@cdeil cdeil merged commit 48af6a4 into gammapy:master Oct 21, 2015
1 check failed
1 check failed
continuous-integration/travis-ci/pr The Travis CI build failed
Details
@cdeil
Copy link
Member

@cdeil cdeil commented Oct 21, 2015

Merged. Thanks!

I added you to the contributor list in e7190bf and let my editor do some auto code formatting fixes in c993789 .

(the failing tests on travis-ci are unrelated installation issues of h5py, that should be fixed soon)

@cdeil
Copy link
Member

@cdeil cdeil commented Oct 21, 2015

Actually I'm pretty sure I won't find time in the near future to work on the excess limit convenience functionality, so I've made a reminder here: #370

@dlennarz dlennarz deleted the dlennarz:fc branch Dec 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.