Extend documentation of quantities returned by estimators #3278

adonath · 2021-03-19T11:45:29Z

Description

This pull request adds a proposal for the missing quantities returned by Estimator classes. Here is the current proposal:

Quantity	Description
npred	Predicted counts of the best fit hypothesis, equivalent to correlated `counts` for backward folding
npred_null	Predicted counts of the null hypothesis
npred_signal	Predicted counts of the signal over `npred_null`, equivalent to (`npred - npred_null`) and correlated `excess` for backward folding
stat	Fit statistics value of the best fit hypothesis
stat_null	Fit statistics value of the null hypothesis

I think the information is basically complete now. My proposed naming scheme would be npred uniformly and not introduce separate definitions like counts_corr and excess_corr for the case of backwards folding as done in the ExcessMapEstimator and instead just describe the equivalency in the table. I think in context of hypothesis testing this is reasonable.

codecov · 2021-03-19T11:54:52Z

Codecov Report

Merging #3278 (e45e12f) into master (3f79b2c) will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3278      +/-   ##
==========================================
+ Coverage   93.79%   93.81%   +0.01%     
==========================================
  Files         144      144              
  Lines       17754    17754              
==========================================
+ Hits        16653    16656       +3     
+ Misses       1101     1098       -3

Impacted Files	Coverage Δ
gammapy/modeling/iminuit.py	`95.65% <0.00%> (+3.26%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3f79b2c...e45e12f. Read the comment docs.

adonath · 2021-03-19T12:47:06Z

I think one should add a bit more information on the definition of the null hypothesis, but this varies between the estimators, so it's better documented on the Estimator classes...

AtreyeeS

Thanks @adonath !
These changes look good to me.

Going through the full file, I think we can improve it a bit more:

Second paragraph:

The core of any estimator algorithm is hypothesis testing: a reference model or counts excess is tested against a null hypothesis. From the best fit reference model a flux is derived and a corresponding \sqrt{\Delta TS} value from the difference in fit statistics to the null hypothesis, assuming one degree of freedom. In this case \sqrt{\Delta TS} represents an approximation of the "classical significance".

This seems to imply that the flux or sqrt(del_ts) is valid only for one degree of freedom, but we start with models, which, naively, can have more dof.
Rather, change to -
"The core of any estimator algorithm is hypothesis testing: a reference model or counts excess is tested against a null hypothesis. From the best fit reference model a flux is derived and a corresponding \sqrt{\Delta TS} value from the difference in fit statistics to the null hypothesis. Assuming one degree of freedom, \sqrt{\Delta TS} represents an approximation of the "classical significance".

The tutorials subsection link to only the light curve notebooks. Add the other estimators as well?

AtreyeeS · 2021-03-19T16:21:13Z

docs/estimators/index.rst

+stat			  Fit statistics value of the best fit hypothesis
+stat_null		  Fit statistics value of the null hypothesis
+================= =================================================
+npred		  	  Predicted counts of the best fit hypothesis


In the GitHub rendering, these lines are not coming inside the table, but rather in a chaotic manner.

Thanks, indeed this was because of mixture of tabs and whitespace. Fixed.

AtreyeeS · 2021-03-19T16:22:50Z

docs/estimators/index.rst

 ================= =================================================

-
 To compute the assymetric errors as well as upper limits one can
 specify the arguments ``n_sigma`` and ``n_sigma_ul``. The ``n_sigma``


Maybe explain what n_sigma_ul is as well?

adonath · 2021-03-19T16:42:29Z

Thanks @AtreyeeS! Comments are addressed...

registerrier

Thanks @adonath

My main comment is that excess is more explicit than signal. Or one could use excess_signal for more clarity, but it is a bit too long.

For instance the n_sig definition in the statistics doc is confusing for many who think it stands for number of sigma.

registerrier · 2021-03-19T16:36:50Z

docs/estimators/index.rst

+================= =================================================
+npred		  	  Predicted counts of the best fit hypothesis
+npred_null        Predicted counts of the null hypothesis
+npred_signal      Predicted counts of the signal over `npred_bull`, equivalent to (`npred - npred_null`)


I think signal is misleading. Because npred_null can contain signal (in the sense of astrophysical photons). I still think excess is better, since it is really what it is.

Yes, I agree. My motivation was to stay in-line with what we define here: https://docs.gammapy.org/0.18.2/stats/index.html#notations but I now realise this requires some more thought. I just checked and npred is also define here https://gamma-astro-data-formats.readthedocs.io/en/latest/spectra/flux_points/index.html#normalization-representation, where it means the predicted counts from the model component only (so npred - npred_null in our definition...).

It's a bit similar with excess, because MapDataset.excess is already defined as counts - background and does not account for any additional source contributions. So that definition of excess is somewhat variable, in the sense that it can either include additional source contributions or not. I wonder whether we can find a fully consistent definition. I'll think about in once more and make a new proposal...

I think we could rename sig -> excess consistently here: https://docs.gammapy.org/0.18.2/stats/index.html#notations, to get rid of the ambiguity with n sigma. But this would require adapting the API of MapDataset and CountsStatistic once again (fine from my side...if not now, we will not do it for v1.0...)

I think the inconsistency between "our" npred (which means the total predicted counts...) and the definition in gadf cannot easily be resolved, it would require to either re-define in Gammapy, which I don't want to do, because MapDataset.npred() is already and establish part of our API (.npred_signal() possibly not that much...) or in gadf.

What about if we just stay in line with the MapDataset and use npred, npred_background and npred_excess, but clearly state, that for estimators by default a "residual convention" like npred_background = dataset.npred() is used? In case there are no source models defined it is equivalent anyway. When looking at the map users will see whether sources are included in the background anyway...

On 2nd thought maybe npred, npred_excess and npred_null is also fine...

I renamed to npred_excess...

registerrier · 2021-03-19T16:49:05Z

docs/estimators/index.rst

+value from the difference in fit statistics to the null hypothesis.
+Assuming one degree of freedom, :math:`\sqrt{\Delta TS}` represents an
+approximation (`Wilk's theorem <https://en.wikipedia.org/wiki/Wilks%27_theorem>`_)
+of the "classical significance".


Is it an approximation of the significance or its definition in our domain?

Hm, I guess it's both and there is no either / or? Taken strictly it's an approximation according to Wilk's theorem and it's also the widely used definition in our domain...

registerrier · 2021-03-19T16:49:33Z

docs/estimators/index.rst

-:math:`\sqrt{\Delta TS}` represents an approximation of the
-"classical significance".
+value from the difference in fit statistics to the null hypothesis.
+Assuming one degree of freedom, :math:`\sqrt{\Delta TS}` represents an


Maybe one should mention explicitly that we use signed sqrt_ts

registerrier · 2021-03-19T16:49:54Z

docs/estimators/index.rst

+norm              Best fit norm with respect to the reference spectral model
+norm_err          Symmetric error on the norm derived from the Hessian matrix
+ts                Difference in fit statistics (`stat - stat_null` )
+sqrt_ts           Square root of ts, in case of one degree of freedom, corresponds to significance (Wilk's theorem)


squared root of ts multiplied by sign(excess)

registerrier · 2021-03-19T16:52:33Z

docs/estimators/index.rst

+================= =================================================
+npred             Predicted counts of the best fit hypothesis
+npred_null        Predicted counts of the null hypothesis
+npred_signal      Predicted counts of the signal over `npred_null`, equivalent to (`npred - npred_null`)


Because signal is ambiguous, I prefer npred_excess with the following definition:

npred_excess = npred - npred_null for forward folding estimators

And similarly:
excess = counts - npred_null for backward folding estimators.

adonath · 2021-03-24T14:18:56Z

@registerrier I'll go ahead and merge this now., but we should have follow up discussion on the n_sig and npred_signal() names...

adonath requested review from AtreyeeS, luca-giunti, registerrier and QRemy March 19, 2021 11:45

adonath self-assigned this Mar 19, 2021

adonath added the docs label Mar 19, 2021

adonath added this to In progress in gammapy.estimators via automation Mar 19, 2021

adonath added this to the v0.19 milestone Mar 19, 2021

adonath changed the title ~~Extend documentation of quantities return by Estimators~~ Extend documentation of quantities returned by estimators Mar 19, 2021

AtreyeeS previously approved these changes Mar 19, 2021

View reviewed changes

adonath dismissed AtreyeeS’s stale review via 48b8fab March 19, 2021 16:41

adonath force-pushed the estimator_quantities branch from e4fcdce to 48b8fab Compare March 19, 2021 16:41

registerrier reviewed Mar 19, 2021

View reviewed changes

adonath added 5 commits March 24, 2021 10:40

Add npred quantities to estimator docs

1f385af

Add proposal for missing estimator quantities

2115138

Fix table indentation

a0da62a

Add nmore links to notebooks

ebc6990

Include sign in TS definition

280d13e

adonath force-pushed the estimator_quantities branch from 89538eb to 280d13e Compare March 24, 2021 09:46

adonath added 2 commits March 24, 2021 11:07

Improve estimator docs

83c84fb

Fix table format

e45e12f

adonath merged commit 569215f into gammapy:master Mar 24, 2021

gammapy.estimators automation moved this from In progress to Done Mar 24, 2021

This was referenced Mar 24, 2021

Cleanup FluxMaps a little bit #3275

Merged

Use FluxMaps in TSMapEstimator #3285

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend documentation of quantities returned by estimators #3278

Extend documentation of quantities returned by estimators #3278

adonath commented Mar 19, 2021 •

edited

codecov bot commented Mar 19, 2021 •

edited

adonath commented Mar 19, 2021

AtreyeeS left a comment

AtreyeeS Mar 19, 2021

adonath Mar 19, 2021

AtreyeeS Mar 19, 2021

adonath commented Mar 19, 2021

registerrier left a comment

registerrier Mar 19, 2021

adonath Mar 19, 2021

adonath Mar 19, 2021

adonath Mar 19, 2021

adonath Mar 19, 2021

adonath Mar 19, 2021

adonath Mar 24, 2021

registerrier Mar 19, 2021

adonath Mar 19, 2021

registerrier Mar 19, 2021

registerrier Mar 19, 2021

registerrier Mar 19, 2021

adonath commented Mar 24, 2021

Extend documentation of quantities returned by estimators #3278

Extend documentation of quantities returned by estimators #3278

Conversation

adonath commented Mar 19, 2021 • edited

codecov bot commented Mar 19, 2021 • edited

Codecov Report

adonath commented Mar 19, 2021

AtreyeeS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adonath commented Mar 19, 2021

registerrier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adonath commented Mar 24, 2021

adonath commented Mar 19, 2021 •

edited

codecov bot commented Mar 19, 2021 •

edited