New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend documentation of quantities returned by estimators #3278
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3278 +/- ##
==========================================
+ Coverage 93.79% 93.81% +0.01%
==========================================
Files 144 144
Lines 17754 17754
==========================================
+ Hits 16653 16656 +3
+ Misses 1101 1098 -3
Continue to review full report at Codecov.
|
I think one should add a bit more information on the definition of the null hypothesis, but this varies between the estimators, so it's better documented on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adonath !
These changes look good to me.
Going through the full file, I think we can improve it a bit more:
- Second paragraph:
The core of any estimator algorithm is hypothesis testing: a reference model or counts excess is tested against a null hypothesis. From the best fit reference model a flux is derived and a corresponding \sqrt{\Delta TS} value from the difference in fit statistics to the null hypothesis, assuming one degree of freedom. In this case \sqrt{\Delta TS} represents an approximation of the "classical significance".
This seems to imply that the flux or sqrt(del_ts) is valid only for one degree of freedom, but we start with models, which, naively, can have more dof.
Rather, change to -
"The core of any estimator algorithm is hypothesis testing: a reference model or counts excess is tested against a null hypothesis. From the best fit reference model a flux is derived and a corresponding \sqrt{\Delta TS} value from the difference in fit statistics to the null hypothesis. Assuming one degree of freedom, \sqrt{\Delta TS} represents an approximation of the "classical significance".
- The tutorials subsection link to only the light curve notebooks. Add the other estimators as well?
docs/estimators/index.rst
Outdated
stat Fit statistics value of the best fit hypothesis | ||
stat_null Fit statistics value of the null hypothesis | ||
================= ================================================= | ||
npred Predicted counts of the best fit hypothesis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the GitHub rendering, these lines are not coming inside the table, but rather in a chaotic manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, indeed this was because of mixture of tabs and whitespace. Fixed.
================= ================================================= | ||
|
||
|
||
To compute the assymetric errors as well as upper limits one can | ||
specify the arguments ``n_sigma`` and ``n_sigma_ul``. The ``n_sigma`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain what n_sigma_ul
is as well?
e4fcdce
to
48b8fab
Compare
Thanks @AtreyeeS! Comments are addressed... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adonath
My main comment is that excess
is more explicit than signal. Or one could use excess_signal for more clarity, but it is a bit too long.
For instance the n_sig
definition in the statistics doc is confusing for many who think it stands for number of sigma.
docs/estimators/index.rst
Outdated
================= ================================================= | ||
npred Predicted counts of the best fit hypothesis | ||
npred_null Predicted counts of the null hypothesis | ||
npred_signal Predicted counts of the signal over `npred_bull`, equivalent to (`npred - npred_null`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think signal
is misleading. Because npred_null
can contain signal (in the sense of astrophysical photons). I still think excess
is better, since it is really what it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. My motivation was to stay in-line with what we define here: https://docs.gammapy.org/0.18.2/stats/index.html#notations but I now realise this requires some more thought. I just checked and npred
is also define here https://gamma-astro-data-formats.readthedocs.io/en/latest/spectra/flux_points/index.html#normalization-representation, where it means the predicted counts from the model component only (so npred - npred_null
in our definition...).
It's a bit similar with excess, because MapDataset.excess
is already defined as counts - background
and does not account for any additional source contributions. So that definition of excess
is somewhat variable, in the sense that it can either include additional source contributions or not. I wonder whether we can find a fully consistent definition. I'll think about in once more and make a new proposal...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could rename sig
-> excess
consistently here: https://docs.gammapy.org/0.18.2/stats/index.html#notations, to get rid of the ambiguity with n sigma. But this would require adapting the API of MapDataset
and CountsStatistic
once again (fine from my side...if not now, we will not do it for v1.0...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the inconsistency between "our" npred
(which means the total predicted counts...) and the definition in gadf cannot easily be resolved, it would require to either re-define in Gammapy, which I don't want to do, because MapDataset.npred()
is already and establish part of our API (.npred_signal()
possibly not that much...) or in gadf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about if we just stay in line with the MapDataset
and use npred
, npred_background
and npred_excess
, but clearly state, that for estimators by default a "residual convention" like npred_background = dataset.npred()
is used? In case there are no source models defined it is equivalent anyway. When looking at the map users will see whether sources are included in the background anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 2nd thought maybe npred
, npred_excess
and npred_null
is also fine...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed to npred_excess
...
docs/estimators/index.rst
Outdated
value from the difference in fit statistics to the null hypothesis. | ||
Assuming one degree of freedom, :math:`\sqrt{\Delta TS}` represents an | ||
approximation (`Wilk's theorem <https://en.wikipedia.org/wiki/Wilks%27_theorem>`_) | ||
of the "classical significance". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it an approximation of the significance or its definition in our domain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I guess it's both and there is no either / or? Taken strictly it's an approximation according to Wilk's theorem and it's also the widely used definition in our domain...
:math:`\sqrt{\Delta TS}` represents an approximation of the | ||
"classical significance". | ||
value from the difference in fit statistics to the null hypothesis. | ||
Assuming one degree of freedom, :math:`\sqrt{\Delta TS}` represents an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one should mention explicitly that we use signed sqrt_ts
docs/estimators/index.rst
Outdated
norm Best fit norm with respect to the reference spectral model | ||
norm_err Symmetric error on the norm derived from the Hessian matrix | ||
ts Difference in fit statistics (`stat - stat_null` ) | ||
sqrt_ts Square root of ts, in case of one degree of freedom, corresponds to significance (Wilk's theorem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
squared root of ts multiplied by sign(excess)
docs/estimators/index.rst
Outdated
================= ================================================= | ||
npred Predicted counts of the best fit hypothesis | ||
npred_null Predicted counts of the null hypothesis | ||
npred_signal Predicted counts of the signal over `npred_null`, equivalent to (`npred - npred_null`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because signal
is ambiguous, I prefer npred_excess
with the following definition:
npred_excess = npred - npred_null
for forward folding estimators
And similarly:
excess = counts - npred_null
for backward folding estimators.
89538eb
to
280d13e
Compare
@registerrier I'll go ahead and merge this now., but we should have follow up discussion on the |
Description
This pull request adds a proposal for the missing quantities returned by
Estimator
classes. Here is the current proposal:counts
for backward foldingnpred_null
, equivalent to (npred - npred_null
) and correlatedexcess
for backward foldingI think the information is basically complete now. My proposed naming scheme would be
npred
uniformly and not introduce separate definitions likecounts_corr
andexcess_corr
for the case of backwards folding as done in theExcessMapEstimator
and instead just describe the equivalency in the table. I think in context of hypothesis testing this is reasonable.