New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve energy handling #290

Merged
merged 19 commits into from Aug 18, 2015

Conversation

Projects
None yet
3 participants
@joleroi
Contributor

joleroi commented Jun 24, 2015

Improve and document energy handling in Gammapy.

  • rewrite everything to properly subclass Quantity
  • fits output with header and everything (should be "just" typing)
  • Decide where data classes e.g. Counts Spectrum should go -> moved to separate PR
  • Remove old stuff and replace with the new classes
  • Make Documentation properly

This will be continued in #331

def __init__(self, counts, energy):
#does not work (why?) -> Kind of crucial

This comment has been minimized.

@cdeil

cdeil Jun 24, 2015

Member

Remove isinstance check here?
If we do it here, we should probably do it in 1000 other places or better yet, go to a statically typed language like C++, no?

This comment has been minimized.

@joleroi

joleroi Jun 24, 2015

Contributor

You mean like we do for Quantities? :)

This comment has been minimized.

@cdeil

cdeil Jun 24, 2015

Member

Now sure what you mean.
There are a bunch of places where we do input validation, that's true:
https://github.com/gammapy/gammapy/search?utf8=%E2%9C%93&q=isinstance

I'm also not completely against it ... if you want to add it I'll still merge ... I tried to write up a loose guideline here a while back:
https://gammapy.readthedocs.org/en/latest/development/index.html#what-checks-and-conversions-should-i-do-for-inputs

Recently I saw this:
http://astropy.readthedocs.org/en/latest/api/astropy.units.quantity_input.html
Haven't tried it, but could be useful to get some input validation with little boilerplate code?

This comment has been minimized.

@joleroi

joleroi Jun 24, 2015

Contributor

The Quantity verification looks useful. I guess there's no point in merging this, until we agreed if it is a good idea or not

@cdeil

This comment has been minimized.

Member

cdeil commented Jun 24, 2015

I always find it much easier to understand / review / discuss new classes / functions when I see usage examples.
It should be easy to use classes to implement the common things users (or other parts of the library) do.

So before I leave long comments here which probably aren't very well thought out, could you please add a section
"energy handling in Gammapy" before the Reference / API section here?
https://github.com/gammapy/gammapy/blob/master/docs/spectrum/index.rst#referenceapi

Similar to what's here:
https://gammapy.readthedocs.org/en/latest/time/index.html#time-handling-in-gammapy

Just for reference ... you're probably aware ... there are (at least) these two use cases:

  • An array or energy values. E.g. the Fermi-LAT diffuse flux is given at certain energies and those are stored in a ENERGY FITS table extension (see e.g. SpectralCube.to_fits). E.g. in Gammalib this is represented by GEnergy.
  • An array of energy bands (a.k.a. bins / bounds). This is usually stored in EBOUNDS tables, e.g. for Fermi-LAT counts cubes and I think also for PHA files. In Gammalib this is represented by a separate class GEbounds.

Single energies and energy bands can probably be represented by the same class, so this list of use cases is extensive?

Without thinking much about it I'd say it might be easier to have two classes.
Alternatively, if you want to have one class representing both, you need to add a flag defining which it is.

I think Sherpa doesn't have classes to represent this, they just have arrays:
https://github.com/sherpa/sherpa/blob/0daa0d2cf4f126e043c0f9832642e4ec2e7347f1/sherpa/astro/data.py#L240
The equivalent for us would be to use Quantity objects and not invent our own classes, just helper functions.

OK, so after thinking about it a bit while typing this, I think that the best solution is to have separate Energies and EnergyBounds classes.
The alternatives (single class or no extra class) aren't bad, but long-term we'll do the same little computations (like computing bin centers or bin widths) over and over and so adding classes with convenience methods adds extra values.

@kingj90 – Does this help or should I make a more concrete API suggestion (method names / inputs / outputs)?

@joleroi

This comment has been minimized.

Contributor

joleroi commented Jun 24, 2015

Hi,

thanks for your answer. I don't see why two separate classes for energy values and energy bounds should be easier to handle. For example: the ARF wrapper class takes energy_lo and energy_hi arrays as input. But if you want to plot effective area vs. energy it would be natural to plot the effarea values at the bin centers. So internally you have to do a conversion.
In the same szenario a user might have an array of energy bin centers + an array of effective areas, so in order to use the ARF wrapper class, he or she needs to convert his bin centers to bin edges first.
The suggestest energybins class, has functions to be initiaciated with bin centers OR bin edges and functions to convert between the two, so a Flag is not really necessary. The user has to create one object related to energy once, and then never has to think about energy again.

@cdeil I can add an example section to make my point clearer, although the example counts spectrum class should illustrate my points more or less.

@cdeil

This comment has been minimized.

Member

cdeil commented Jun 24, 2015

I think if you don't have two classes or the flag one of the use cases will come out messy.
My current understanding is that you consider the energy_bounds more fundamental ... so I think the energy use case will not come out nicely. Can you write down how a user would load an ENERGY extension (e.g. from the Fermi diffuse model cube) and store and use it? Do you make up and store EBOUNDS using log center method in the read method and then go back to the energy array in the write method? Or do you store the ENERGY array as-is in the class?

Note that not energy binning is not always equal log space.
The Fermi tools support arbitrary energy binning:
http://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtbin.txt
And we will need that too, e.g. it might make sense to have a larger bin at higher energies (e.g. in Fermi 10 - 30 - 100 - 1000 GeV) and we will have flux point computation methods that stick the spectral points not at the log bin center:
https://gammapy.readthedocs.org/en/latest/tutorials/flux_point/index.html

@cdeil cdeil referenced this pull request Jun 24, 2015

Closed

Cube bg model container class: CubeBackgroundModel #292

5 of 8 tasks complete
@joleroi

This comment has been minimized.

Contributor

joleroi commented Jul 9, 2015

Hi,
sorry for the late reply. I see your point. I will try to come up with a solution that gets by with only one class, since I really don't like the idea of having two classes. Maybe I can find a nice solution. Right now, I would say calculating the bin edges in the read method is a good starting point. Btw, I am aware that user defined binning is crucial ;)

@cdeil

This comment has been minimized.

Member

cdeil commented Jul 11, 2015

@kingj90 – Thanks for taking this on. It would be good if we can find a nice solution for this in the coming weeks ... we're starting to duplicate energy binning / axis computations all over gammapy (e.g. currently @mapazarr in his work on the background model cube).

@mapazarr mapazarr referenced this pull request Jul 12, 2015

Merged

Add cube background model class #299

8 of 8 tasks complete
@joleroi

This comment has been minimized.

Contributor

joleroi commented Jul 12, 2015

So this is my new suggestion how to to the energy handling. I put two examples that should illustrate how this class is used.

TODO

  • Extend Fits I/O
  • Add tests
  • Rewrite section "energy handling in gammapy"
  • Add more examples

I guess there's no discussion, that classes handling the energy binning are easier to handle than just passing around arrays and doing conversion etc in each class individually. From the start I suggested to have only one class handling bin centers and edges. There where some valid objections by @cdeil but I hope I adressed most of them in this new version of my suggestion for energy handling

@joleroi joleroi force-pushed the joleroi:energyhandling branch from 3c38935 to 56b5469 Jul 19, 2015

@joleroi

This comment has been minimized.

Contributor

joleroi commented Jul 19, 2015

In case there's no message when updating comments, I put new stuff (see comment above)

@cdeil

This comment has been minimized.

Member

cdeil commented Jul 19, 2015

@kingj90 – Thanks for the update!
When updating a comment, no notifications are sent, so yes, you have to leave a new comment to continue the conversation.

I read through this PR and thought about this for a bit, and I'm ~ 80% sure that the current single-class approach is no good. Sorry to be blunt, I know you've put a work into this PR. But this will be at the core of many other classes and functionality in Gammapy, and I think it's important to get it right

So I'll try to advocate an Energy and EnergyBounds two-class approach once more.

From this PR it's hard to see / discuss if the single class is a good idea, because the classes implemented here aren't used yet (e.g. in classes that represent counts or effective area or exposure or flux 1D vectors or 3D cubes or IRF classes) and the usage examples you show in the high-level docs are just how to instantiate the objects. In the end the important thing is to make the code where energies are used as clear as possible.

Currently one can write:

energy = EnergyBinning([1, 3, 10] * u.TeV)

and it's not clear if energy_binning represents energy values (e.g. for flux points or the energies in a diffuse emission SpectralCube or energy bin edges (e.g. for a CountsSpectrum).

Contrast this with

energy = Energy([1, 3, 10] * u.TeV)
energy_bounds = EnergyBounds([1, 3, 10] * u.TeV)

there it's clear from the object type what the object represents.
When you look through the methods you currently have in EnergyBinning you'll see that most of them have and if statement or a try-except-statement, to act like an Energy or EnergyBounds object.
I think both implementation and usage will be simplified by using separate classes.

Now I don't claim to have a clear picture of how the two-class approach should be implemented yet.

  • Should Energy and EnergyBounds sub-class Quantity or have the quantity as a data member? (without thinking about it much I think sub-classing makes sense because of the "is a" relationship).
  • Should EnergyBounds sub-class Energy to avoid re-implementing some common functionality, or should both sub-class an abstract EnergyBase class to share code? (here naively I think keeping them separate is OK, and if there's methods that are more than 3 lines that both implement, but those in private helper functions)
  • Should the energy to pixel transformations from LogEnergyAxis be added to Energy only, or is it useful to also have it on EnergyBounds?

One more reason why I'm not 100% sure that separate classes are the correct approach is that e.g. in the ARF format as implemented in EffectiveAreaTable, the energy axis info is stored in lo and hi bin edges, i.e. should be represented by an EnergyBounds data member.
But effective area should be thought of as a function of energy, not as something like counts with values in bins. So to use the energy info one has to take the log bin centers and use those as grid points for interpolation, i.e. the bin edges are never used.

So the question is ... do we want to be 100% clear everywhere in our code if we are dealing with an Energy or EnergyBounds object, or do we like to have the ambiguity for cases like this? (note that in this case one could simply compute energy = energy_bounds.log_center_energy in the constructor and then have that as an extra data member.)

What should be the functionality of this class / these classes? Not much ... they are container objects with convenience methods, mostly dealing with energy value -- bin index computations and for EnergyBounds additionally computing bin edge, center, width. Exposing those properties only sometimes on an EnergyBinning class or raising YouAreUsingItInconsistently errors is not nice, no?

So @kingj90, what do you think?
Are you willing to try out the two-class approach (and maybe try sub-classing quantity and integrating the existing LogEnergyAxis functionality that's used in SpectralCube)?
If you still strongly prefer the single-class approach ... can you explain some more why you think it's better?

>>> centers = np.sqrt(binning[:-1]*binning[1:])
>>> e_axis = EnergyBinning(center, 'bin_centers')
The main advantage of the EnergyBinning class with respect to pure numpy arrays or two separate classes handling both use cases is that the transition from energy bin edges to bin centers has to be only done once in constructor. All other class do not need to deal with the energy binning conversions anymore.

This comment has been minimized.

@cdeil

cdeil Jul 19, 2015

Member

I don't understand this reasoning why a single class is better than two classes.
Computing the energy bin center array from the energy bounds array is one line:

energy_center = energy_bounds.log_energy_center

This can happen constructors where useful. It can also be re-computed on the fly if you prefer to only store energy_bounds but then want energy_center in other places because it's very fast ... just call np.diff.
We can even cache this if you want, i.e. an EnergyBounds object would compute it's energy_center array once and then have it as a data member in it's cache (or as a normal class member if you prefer).

class EnergyBinning(object):
"""Class to handle energy dimension for e.g. counts spectra, ARF tables

This comment has been minimized.

@cdeil

cdeil Jul 19, 2015

Member

Always start docstrings with a short single line ending with a dot.
This is then what gets displayed in overview lists of functions / classes:
https://gammapy.readthedocs.org/en/latest/irf/index.html#classes

Parameters
----------
bin_edges : `~astropy.units.Quantity`
Energy bin edges

This comment has been minimized.

@cdeil

cdeil Jul 19, 2015

Member

This text description for a paramter should be indented by four more spaces (everywhere)

return EnergyBinning.from_edges(bin_edges, 'log')
@staticmethod
def from_fits(hdulist):

This comment has been minimized.

@cdeil

cdeil Jul 19, 2015

Member

I prefer separate methods for the separate formats.
It means the caller has to be more explicit, but that's a good thing.

if self._bin_centers is not None:
return self._bin_centers
else:
raise ValueError("Bin centers not defined for this energy axis")

This comment has been minimized.

@cdeil

cdeil Jul 19, 2015

Member

This is a bad API, I think (too complicated): properties that are sometimes there and sometimes not.

@joleroi

This comment has been minimized.

Contributor

joleroi commented Jul 19, 2015

Hi,
thanks for your answer. I give up :) I think it's more important to have a working solution ASAP, than loose too much time arguing about this one class vs. two classes issue. Plus, I am more and more convinced that the arguments for two classes prevail, whether it's because you have more of them or you can present them in a more convincing manner, I don't know. In the end any working solution is fine for me.
My starting point was the ARF wrapper class, where as you admitted the one class solution has advantages but with a call to energy_bounds.log_centers() or something like that, the two class solution should also be fine.

Before starting to reimplement stuff now:

  • I don't know what subclasses are, but I guess it's like C++ inheritance. Why do you think it makes sense to inherit from Quantity? Which functionality of Quantity do we want in the EnergyBounds/Energy class? I could think of the to("unit"), but it don't know if thats really needed.
  • I really don't' like the idea of EnergyBound inheriting from Energy, if we start like this we end up having something like root, where TH2 inherits from TGraphPainter (I am exaggerating)
  • I don't know about the LogEnergyAxis - naively I would say it goes to Energy, but I really don't know, but I guess this can be a second step

I don't know if I will find time tonight, but I will try to come up with something soon

@cdeil

This comment has been minimized.

Member

cdeil commented Jul 19, 2015

Yes, to first order Python sub-classes are like C++ sub-classes.
As an example: an Angle is a Quantity with some extra functionality.
If you look at the GEnergy class it's mostly unit-handling. The Astropy Quantity class does this already and we definitely don't want to re-implement unit handling, so my first attempt at this would be to sub-class Quantity:

class Energy(Quantity):
    ....

Sometimes getting the constructor right can be difficult, especially for a quite complicated class like Quantity, which is a sub-class of numpy.ndarray, but it should be possible.

OK to start with separate Energy and EnergyBounds. Should we see that all or most of the Energy functionality is duplicated in EnergyBounds we can consider subclassing EnergyBounds(Energy).

For the LogEnergyAxis functionality (basically interpolation) I'm not sure ... it could go in Energy or maybe even remain a separate class that has an Energy data member or sub-classes Energy.
I don't have time to look at how this is used in SpectralCube now ... probably OK to leave this as a second step.

Just to be clear, the Energy class should replace EnergyBinCenters and the EnergyBounds class EnergyBinEdges.
Use the Github full-text search to see if / where these are used and replace those cases (if any) with your new classes.

One more thing: I'm pretty sure these classes will remain in gammapy.spectrum. I think anything that has to do with energies and fluxes can go there, it fits very well. I don't want to start a separate gammapy.energy (like I did for time) and it should definitely not go in gammapy.datasets. With gammapy.data and gammapy.utils I'm not super-happy, maybe these will be re-structured. So bottom line: please just put this in gammapy.spectrum for now, or propose another place if you think there's a better one.

@joleroi

This comment has been minimized.

Contributor

joleroi commented Jul 19, 2015

Alright, so sub-class Quantity it is. Yes, it should stay in gammapy.spectrum, I just thought it might be a good idea to start an energy.py.

@joleroi

This comment has been minimized.

Contributor

joleroi commented Jul 20, 2015

Hi, I did not have much time, but this is the version with 2 classes.
I don't yet see an advantage of having it EnergyBounds(Energy). The only function we could reuse is equal_log_spacing but this is 4 lines, all the FITS I/O has to be rewritten anyway, since we want very explicit functions. If you disagree, we can talk about it again.

@cdeil cdeil added this to the 0.3 milestone Jul 22, 2015

@cdeil cdeil self-assigned this Jul 22, 2015

@cdeil cdeil changed the title from WIP: Suggestion for energy handling to WIP: Improve energy handling Jul 22, 2015

@cdeil

This comment has been minimized.

Member

cdeil commented Jul 22, 2015

@kingj90 I've updated the title and moved the task list to the top and put the 0.3 milestone for this.
It's not critical, so if this isn't ready next week we'll just postpone.
I don't have much time today, but I'll have a quick look now.

Energy handling in Gammapy
==========================
.. warning::

This comment has been minimized.

@cdeil

cdeil Jul 22, 2015

Member

Remove this warning. It won't be more experimental than the rest of Gammapy when merged.

Basics
------
Most objects in Astronomy require an energy axis, e.g. counts spectra or effective area tables. In general, this axis can be defined in two ways.

This comment has been minimized.

@cdeil

cdeil Jul 22, 2015

Member

Also in RST files, please hard-wrap test to ~80 lines.

This is completely experimental
Basics

This comment has been minimized.

@cdeil

cdeil Jul 22, 2015

Member

Now that we have two classes, it might make more sense to structure this per class instead of per task (like construction, I/O, computations)?

cols = fits.ColDefs([col1])
hdu = fits.BinTableHDU.from_columns(cols)
hdu.name = 'ENERGIES'
hdu.header['TUNIT1'] = "{0}".format(self.unit)

This comment has been minimized.

@mhvk

mhvk Aug 14, 2015

I think you need hdu.header['TUNIT1'] = self.unit.to_string('fits') here, to be sure the unit gets formatted properly for a fits file.

"""
def __new__(cls, energy, unit=None, dtype=None, copy=True):

This comment has been minimized.

@mhvk

mhvk Aug 14, 2015

If you define an nbins property that returns self.size -1, you don't need the whole __new__ stanza.

return self
def __array_finalize__(self, obj):

This comment has been minimized.

@mhvk

mhvk Aug 14, 2015

And this won't be needed either.

"""
center = np.sqrt(self[:-1] * self[1:])
return Energy(center)

This comment has been minimized.

@mhvk

mhvk Aug 14, 2015

I guess you reconvert, since this would be an EnergyBounds instance? If so, quicker is

return center.view(Energy)
"""Upper energy bin edges
"""
upper = self[1:]
return EnergyBounds(upper)

This comment has been minimized.

@mhvk

mhvk Aug 14, 2015

Why the need to pass it through EnergyBounds? upper should already be anEnergyBounds` instance, no?

"""
lower = self[:-1]
return EnergyBounds(lower)

This comment has been minimized.

@mhvk

mhvk Aug 14, 2015

Same as above.

EnergyBounds
"""
bounds = super(EnergyBounds, EnergyBounds).equal_log_spacing(

This comment has been minimized.

@mhvk

mhvk Aug 14, 2015

If this is made into a proper classmethod, you can just do:

return super(EnergyBounds, cls).equal_log_spacing(emin, emax, nbins + 1, unit)
return bounds.view(EnergyBounds)
@staticmethod
def from_fits(hdu, unit=None):

This comment has been minimized.

@mhvk

mhvk Aug 14, 2015

Same comments as for Energy.from_fits; think that as a class method, you would not have to redefine it at all.

This comment has been minimized.

@joleroi

joleroi Aug 17, 2015

Contributor

@cdeil : Maybe we WANT to redefine it, right?

cols = fits.ColDefs([col1])
hdu = fits.BinTableHDU.from_columns(cols)
hdu.name = 'EBOUNDS'
hdu.header['TUNIT1'] = "{0}".format(self.unit)

This comment has been minimized.

@mhvk

mhvk Aug 14, 2015

See comment for Energy.to_fits

@mhvk

This comment has been minimized.

mhvk commented Aug 14, 2015

@cdel, @kingj90 - sorry for a slow look, but I'm on holidays right now and only occasionally check e-mail. As you'll see from my in-line comments, I think the Energy subclass can be simplified a bit, especially by using classmethod more, and letting more be done by Quantity.__new__. Hope this helps!

@cdeil

This comment has been minimized.

Member

cdeil commented Aug 16, 2015

@mhvk – Thanks a lot for the Energy class review!!!

The reason there's staticmethod at the moment in a few places where classmethod should be used, as pointed out by you, is that I didn't know how / when to use classmethod until recently. This was fixed in Gammapy in #313, but this PR was started before. So @kingj90 – Please follow Marten's advice.

@@ -1 +1 @@
Subproject commit 90de2d8340d1eafaface009625056ca62f5a6c76
Subproject commit 161773fa72d916c498e0a2a513ecc24460244ac8

This comment has been minimized.

@cdeil

cdeil Aug 16, 2015

Member

This astropy_helpers Git submodule change shouldn't be in this PR.
The easiest solution is to make a new commit that re-sets it to 90de2d8 which is the version pointing to in Gammapy master.

This should work:

cd astropy_helpers
git fetch origin
git checkout 90de2d8
cd ..
git add astropy_helpers
git commit -m "Re-set astropy-helpers to 90de2d8"
"""Energy bin centers.
Stored as "ENERGIES" FITS table extensions.

This comment has been minimized.

@cdeil

cdeil Aug 16, 2015

Member

I'd put as docstring something a bit more generic, e.g.

"""Energy quantity scalar or array.

This is a `~astropy.units.Quantity` sub-class that adds convenience methods
to handle common tasks for energy bin center arrays, like FITS I/O or generating
equal-log-spaced grids of energies.

See :ref:`gammapy-energy-handling` for further information.

where the ref points to a label in front of the title in the high-level docs.

Parameters
----------
energy : `~numpy.array`, scalar, `~astropy.units.Quantity`
Energy

This comment has been minimized.

@cdeil

cdeil Aug 16, 2015

Member

Remove these empty lines between parameters to save some screen space and be consistent throughout Gammapy.

----------
hdu: `~astropy.io.fits.BinTableHDU`
``ENERGIES`` extensions.
unit : `~astropy.units.UnitBase`, str

This comment has been minimized.

@cdeil

cdeil Aug 16, 2015

Member

Add , None for the case where the unit is to be read from the FITS header.

unit : `~astropy.units.UnitBase`, str
Energy unit
Returns

This comment has been minimized.

@cdeil

cdeil Aug 16, 2015

Member

You can remove this returns section and put the return type in the first line.
See here

@cdeil

This comment has been minimized.

Member

cdeil commented Aug 16, 2015

@kingj90 – Just FYI ... if you want to postpone e.g. LogEnergyAxis to future PRs or leave it to me to finish this up and move on to spectral analysis, that would be totally fine.

@joleroi

This comment has been minimized.

Contributor

joleroi commented Aug 17, 2015

Hi,
thank you both for your comments.
@cdeil

  • I have left a few inline comments where I think it might be a good idea to use staticmethod instead of classmethod. Basically everywhere where we read FITS files. I remember you saying that you like this stuff to be very explicit. Correct me when I'm wrong.
  • Yes, I would like to move to the spectral analysis.

@joleroi joleroi force-pushed the joleroi:energyhandling branch from faf141c to dde3e7c Aug 18, 2015

@joleroi

This comment has been minimized.

Contributor

joleroi commented Aug 18, 2015

ready to merge (take 2)

@cdeil cdeil changed the title from WIP: Improve energy handling to Improve energy handling Aug 18, 2015

cdeil added a commit that referenced this pull request Aug 18, 2015

@cdeil cdeil merged commit 50af5cc into gammapy:master Aug 18, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@cdeil cdeil referenced this pull request Aug 18, 2015

Closed

Improve energy handling – take 2 #331

0 of 2 tasks complete
@cdeil

This comment has been minimized.

Member

cdeil commented Aug 18, 2015

@kingj90 – Thanks!

I've moved the two remaining items from the task list at the top to #331.

@joleroi joleroi deleted the joleroi:energyhandling branch Sep 18, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment