Clarify what kinds of arrays are allowed for IRFs #145

maxnoe · 2020-03-27T15:59:16Z

Currently the standard asks for the storage of two arrays of lower and upper bin edges.

In python:

# 50 bins between 1 GeV and 1 PeV
bin_edges = np.logspace(-3, 3, 51)
energ_lo = bin_edges[:-1]
energ_hi = bin_edges[1:]

same goes for theta

I would propose to change this for the next iteration of the standard to change this to just storing a
single array with bin edges.

The text was updated successfully, but these errors were encountered:

tibaldo · 2020-03-27T16:09:50Z

This seems dangerous to me. Are we guaranteed that the bins will always be contiguous? Is there only one unambiguous way in your proposed format to determine the edges of a bin?
Also, this is inconsistent with well-established conventions, e.g. the EBOUNDS extension in OGIP (link).
What are the needs that motivate your request exactly?

TarekHC · 2020-03-27T16:22:21Z

Yes, overlapping bins require the two arrays. This should not be changed.

cboisson · 2020-03-27T16:24:26Z

Yes, this should not be changed. Le 27/03/2020 à 17:22, Tarek Hassan a écrit :

Yes, overlapping bins require the two arrays. This should not be changed. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#145 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADY2AKTFEWHTC3HTITVHSZ3RJTHE5ANCNFSM4LVDOZQQ>.

-- Catherine Boisson UMR 8102, Laboratoire Univers et THeories (LUTH) Observatoire de Paris-Meudon, F-92195 Meudon Cedex Direct Line: Tel. +33 1 45 07 74 36

maxnoe · 2020-03-28T10:56:03Z

Ok, I see that it's probably better to be kept as is, however I think we should then add some clarifications to the standard on what is allowed or expected and what not.

Are we guaranteed that the bins will always be contiguous?

We want to enable non-contiguous e.g. effective areas? What are the use cases where this is not the case?
How would the science tools deal with non-contiguous effective areas @adonath @jknodlseder ?

Yes, overlapping bins require the two arrays. This should not be changed.

Again, I'd like to know some use cases here. I think overlapping bins might be usefull for interpolation or getting closer to something like "differential effective area" than "total effective area in bins of energy".

It would be good to collect references for this and make clear what is allowed in the standard and what is not.

In my opinion, we should only add well defined things to these standards. So if we want to give the possibility for non-contiguous effective areas or overlapping bins, this should be mentioned in the respective documents.

A standard that basically allows storing anything as long as it has the correct names and shape is not very helpfull if the semantics of what is stored are not clear.

jknodlseder · 2020-03-28T22:35:09Z

Currently the ctools / GammaLib simply take the bin centres and interpolate between them bi-linearly (for 2D arrays). What's wrong with storing both bin edges for bins (whatever the bins are)? Memory should not be an issue, and this allows to know exactly how the bins are defined. And then everything is allowed (non contiguous and overlapping bins).

…

Le 28 mars 2020 à 11:56, Maximilian Nöthe ***@***.***> a écrit : Ok, I see that it's probably better to be kept as is, however I think we should then add some clarifications to the standard on what is allowed or expected and what not. Are we guaranteed that the bins will always be contiguous? We want to enable non-contiguous e.g. effective areas? What are the use cases where this is not the case? How would the science tools deal with non-contiguous effective areas @adonath <https://github.com/adonath> @jknodlseder <https://github.com/jknodlseder> ? Yes, overlapping bins require the two arrays. This should not be changed. Again, I'd like to know some use cases here. I think overlapping bins might be usefull for interpolation or getting closer to something like "differential effective area" than "total effective area in bins of energy". It would be good to collect references for this and make clear what is allowed in the standard and what is not. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#145 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAW2QV7SM4IJZKKAZGFYJL3RJXJU7ANCNFSM4LVDOZQQ>.

maxnoe · 2020-03-30T07:30:42Z

Ok, I assume that also answers what my next question would have been: why it was decided to use binary table extensions with a single row instead of image hdus for the irfs.

jknodlseder · 2020-03-30T09:41:26Z

Binary tables have proven flexible, effective and extensible for Fermi-LAT (the format is identical to the Fermi-LAT format)

…

Le 30 mars 2020 à 09:30, Maximilian Nöthe ***@***.***> a écrit : Ok, I assume that also answers what my next question would have been: why it was decided to use binary table extensions instead of image hdus for the irfs. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#145 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAW2QV6WCPI4ZZTBTQKC5PLRKBDDBANCNFSM4LVDOZQQ>.

kosack · 2020-03-30T09:51:40Z

why it was decided to use binary table extensions with a single row instead of image hdus for the irfs.

I think also it's a question of binning - often you might have non-uniform binning in an IRF (even if right now we do not do so I think), for example to deal with lack of statistics at high energies. The IMAGE extension can't deal with that.

registerrier · 2020-03-30T13:09:36Z

Indeed science tools tend to take bin center from the bin edges and perform some type of interpolation.
Yet, if no assumption is to be made on the binning (e.g. varying bin sizes or missing bins), there is no unambiguous determination of what is the bin center (linear, log, sqrt, other?). Depending on bin sizes, this will lead to significant differences in estimated values.

No IRF is differential in true energy/position. Therefore, when storing an IRF, one could consider that specifying one energy/position and the corresponding area/energy dispersion is less ambiguous. In particular, if MC simulations made to extract the IRFs were to be done at a single energy and position.

Standards such as OGIP/Caldb, consider that energy bins have to be small and that linear interpolation is the default:
https://heasarc.gsfc.nasa.gov/docs/heasarc/caldb/docs/memos/cal_gen_92_003/cal_gen_92_003.html#tth_sEc7

There is no unambiguous interpolation scheme. Should one assume the response is flat inside a bin, linearly varying between two bin centers, log-linearly varying, log-log etc. We might need some specifications here as well.

maxnoe · 2020-03-30T13:29:34Z

@registerrier yes, this is exactly what I feared. That currently this standard is too under-specified to be useful on its own.

So at least adding ENERGY and THETA for the central point that should be used for interpolation would make sense, right? bin edges (low, high) could be optional then.

lmohrmann · 2020-03-30T21:06:20Z

@maxnoe I think most commenters preferred to keep the low and high bin edges.
@registerrier, you would propose to add some information on the default interpolation scheme to be used for any of the axes?

tibaldo · 2020-03-31T06:22:57Z

I think Régis has a valid point here. I am under the impression that volume is not an issue here. Would it be sensible and useful to add an extra column with a (recommended) reference energy for each bin, but also leaving the information of the energy boundaries for completeness?

maxnoe · 2020-03-31T06:32:05Z

Yes, this is what I proposed in my last comment, an additional ENERGREF or ENERGY or how we want to call it.

Btw, FITS column names are not bound to the uppercase and eight character limitatio of header keywords.

Is the another technical reason we limit ourselves to eight characters and uppercase?

adonath · 2020-03-31T10:37:07Z

In addition to what @registerrier commented:

Given the way we compute the IRFs, we should definitely keep the bounds information of the IRF bins. For me the only "strictly correct" way to interpret the information stored in the MC IRFs, is that the value represents the expected counts in a given parameter interval, which is an integrated quantity (even if divided by the bin width...). To keep the full information we should provide the integration interval.

In Gammapy we have two ways of interpolating the information stored in the IRFs:

Bilinear (sometime log-log or semi-log) interpolation of the mean values of the IRFs in the parameter bin, assuming either the linear or log center of the bin.
Quadratic interpolation of the cumulative distribution. This is an alternative method I have tried, that is closer to the "strictly correct" interpretation of the information I mentioned above. This methods avoids the need to define a "reference" value (or bin center...). It assumes that the value stored in the IRF corresponds to the integral in the given interval and assigns this value to the upper edge of the bin (assuming a linear distribution within the bin). This method also avoids the problem of extrapolation of the IRFs, because the cumulative distribution is 0 if evaluated outside the lowest bin edge and equivalent the total number of counts if evaluated outside the uppermost bin edge. An illustration of the method is here: https://github.com/adonath/notebooks-public/blob/master/histogram_resampling.ipynb

I think defining a reference value is only meaningful, if we provide an assumption on how the events are distributed within the bin in addition (in fact once this is defined the reference value can be easily computed from this...). In Gammapy we have a meta interp keyword for our MapAxis object which defines the scaling ("log", "linear", "sqrt") of the axis. This is then used to compute bin centers, logarithmic bin centers etc. and used to choose the interpolation method (e.g. linear, log-log, or semi-log) as well. I think providing this information on the "scaling" of the IRF axis in the meta data might be useful. On the other hand, in most cases there is a very plausible choice anyway (but I guess)

In Gammapy we already make assumptions of this kind e.g. background is interpolated in log-log in energy and linearly in spatial axes. The main advantage is a better precision when working with coarser bins. As long as the IRFs are binned fine enough (whatever this means) it should not matter anyway.

@maxnoe I think so far there is no use-case for non-contiguous effective areas (or IRFs in general) and I'm not sure either, whether it will ever arise. To me the argument for the separate min and max bounds columns is simply that if the data is stored in a table, the column length is the equivalent to the column length of the data. If there was a column with edges, it would be of length N + 1 and you would need a fill value for the data...

maxnoe · 2020-03-31T11:46:58Z

@adonath actually, the IRFs are stored in a single rows in the tables.

The columns don't need to have the same shape, e.g. it would be perfectly valid to store:

Energy (51, )
Theta (11, )
Aeff (50, 10)

adonath · 2020-03-31T12:48:48Z

@maxnoe Yes, you are right. I forgot the data is stored row-wise. For some reason I had in mind we store it as a flattened ND array in a table column.

maxnoe changed the title ~~Remove duplication of bin edges~~ Clarify what kinds of arrays are allowed for IRFs Mar 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify what kinds of arrays are allowed for IRFs #145

Clarify what kinds of arrays are allowed for IRFs #145

maxnoe commented Mar 27, 2020

tibaldo commented Mar 27, 2020

TarekHC commented Mar 27, 2020

cboisson commented Mar 27, 2020 via email

maxnoe commented Mar 28, 2020 •

edited

Loading

jknodlseder commented Mar 28, 2020 via email

maxnoe commented Mar 30, 2020 •

edited

Loading

jknodlseder commented Mar 30, 2020 via email

kosack commented Mar 30, 2020

registerrier commented Mar 30, 2020

maxnoe commented Mar 30, 2020 •

edited

Loading

lmohrmann commented Mar 30, 2020

tibaldo commented Mar 31, 2020

maxnoe commented Mar 31, 2020

adonath commented Mar 31, 2020

maxnoe commented Mar 31, 2020

adonath commented Mar 31, 2020

Clarify what kinds of arrays are allowed for IRFs #145

Clarify what kinds of arrays are allowed for IRFs #145

Comments

maxnoe commented Mar 27, 2020

tibaldo commented Mar 27, 2020

TarekHC commented Mar 27, 2020

cboisson commented Mar 27, 2020 via email

maxnoe commented Mar 28, 2020 • edited Loading

jknodlseder commented Mar 28, 2020 via email

maxnoe commented Mar 30, 2020 • edited Loading

jknodlseder commented Mar 30, 2020 via email

kosack commented Mar 30, 2020

registerrier commented Mar 30, 2020

maxnoe commented Mar 30, 2020 • edited Loading

lmohrmann commented Mar 30, 2020

tibaldo commented Mar 31, 2020

maxnoe commented Mar 31, 2020

adonath commented Mar 31, 2020

maxnoe commented Mar 31, 2020

adonath commented Mar 31, 2020

maxnoe commented Mar 28, 2020 •

edited

Loading

maxnoe commented Mar 30, 2020 •

edited

Loading

maxnoe commented Mar 30, 2020 •

edited

Loading