New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add n-dim data base class for gammapy.irf #527

Merged
merged 12 commits into from May 25, 2016

Conversation

Projects
None yet
3 participants
@joleroi
Contributor

joleroi commented May 10, 2016

This PR introduces gammapy.utils.nddata
This module contains a base class for an NDData array. It is supposed to be subclassed by all classes in `gammapy.irf`` and maybe in other places.

An scratch EffectiveArea2D class is also part of this PR for the sake of illustration.

@joleroi joleroi added this to the 1.0 milestone May 10, 2016

@cdeil cdeil modified the milestones: 0.5, 1.0 May 10, 2016

@@ -0,0 +1,187 @@
import pytest

This comment has been minimized.

@cdeil

cdeil May 10, 2016

Member

Use this:

from astropy.tests.helper import pytest
import pytest
import random
import numpy as np
from astropy.units import Unit as u

This comment has been minimized.

@cdeil

cdeil May 10, 2016

Member

The official recommendation is

import astropy.units as u

so if you do

from astropy.units import Unit as u

that is confusing to Astropy coders.

In Gammapy we have often this:

from astropy.units import Quantity
spam = Quantity(42, 'spam')

Maybe use that here?

This class represents an ND Data Array. The data stored as numpy array
attribute. The data axis are separate classes and this class has them as
members. The axis order follows numpy convention for arrays, i.e. the axis
added last is at index 0. For an example see NDData_demo.ipynb.

This comment has been minimized.

@cdeil

cdeil May 10, 2016

Member

How do you want to handle docs for this?
Maybe for now copy the NDData_demo.ipynb to gammapy-extra and maintain it there while this class develops (as the option requiring least effort for now)?

class DataAxis(Quantity):
"""Data axis for unbinned values"""
def __new__(cls, energy, unit=None, dtype=None, copy=True, name=None):

This comment has been minimized.

@cdeil

cdeil May 10, 2016

Member

Rename energy to something more generic?
Actually I'm not sure what it is ... the docstring says "Data axis for unbinned values"!?

Probably an

Examples
-------

section with one or two examples would help more than a text description.

# Data in wrong axis order (have to reset data first)
hist._data = None
data = [random.expovariate(hist.axes[1][_].value + 1) for _ in range(10)]

This comment has been minimized.

@cdeil

cdeil May 10, 2016

Member

We use numpy.random everywhere instead of the stdlib random.
Is this here possible also for simplicity / consistency?

numpy.random.exponential is the same distribution as random.expovariate, no?
This also means you can have an array expression and not need a list comprehension, no?

@@ -0,0 +1,441 @@
"""Scratch for NDDataArray class"""

This comment has been minimized.

@cdeil

cdeil May 10, 2016

Member
  • Missing license and future import line. Copy them from any other file in Gammapy.
  • Missing __all__ and you should add an automodapi directive for this file below the one here so that these classes show up in the API docs.
  • Can you put a better module-level docstring?
return res
def plot_image(self, ax=None, plot_kwargs = {}, **kwargs):

This comment has been minimized.

@cdeil

cdeil May 10, 2016

Member

Coverage looks pretty good:
https://coveralls.io/builds/6118260/source?filename=gammapy%2Futils%2Fnddata.py

This is the main method that's not tested.
I'd suggest adding a test that exercises this method (without assertions for now).

This comment has been minimized.

@joleroi

joleroi May 11, 2016

Contributor

I thought about this a little bit more and I think it's best to remove plotting completely from the base class. All code go can reuse is a call to plt.imshow (and maybe axis labels), basically everything else will depend on a specific use case.

@cdeil cdeil self-assigned this May 10, 2016

@cdeil

This comment has been minimized.

Member

cdeil commented May 10, 2016

I've done a superficial review and left some comments inline.

Do you want to apply it to AEFF or some other IRF in this PR, or add some simple dummy IRF in the test or as an example? (I'm not sure what is best ... either way this will be a large PR, bit it'll still be OK to review.)

@joleroi

This comment has been minimized.

Contributor

joleroi commented May 12, 2016

Thanks for you comments. I added an minimal aeff example file. Next thing is to test if the NDDataArray FITS I/O works for the test datastore

cc @JouvinLea Since you're also working on IRF stuff maybe you want to have a look, too? If it comes to the worst you will have to use this class at some point 😉

@joleroi joleroi changed the title from WIP: ND data base class for gammapy.irf to ND data base class for gammapy.irf May 12, 2016

@JouvinLea

This comment has been minimized.

Contributor

JouvinLea commented May 12, 2016

Sorry I really didn't follow this PR... What is the purpose exactly?

@joleroi

This comment has been minimized.

Contributor

joleroi commented May 12, 2016

I'll answer via email

joleroi added some commits May 12, 2016

@joleroi

This comment has been minimized.

Contributor

joleroi commented May 19, 2016

@cdeil I added an example EffectiveArea2D class, that uses the NDData base class. Please review. I will continue with moving functionality from the old EffectiveAreaTable2D class in the next days.

class EffectiveArea2D(NDDataArray):
"""2D effective area table
Axes: ``THETA``, ``ENERG``

This comment has been minimized.

@joleroi

joleroi May 19, 2016

Contributor

The axis names are currently just the column names in the Fits file, we might want to map this to energy and offset?

This comment has been minimized.

@cdeil

cdeil May 19, 2016

Member

Yes, probably a mechanism to map FITS axis names to something else would be nice, for cases where the FITS names are cryptic.

@joleroi joleroi force-pushed the joleroi:nddata branch from 93e1b13 to 572e993 May 19, 2016

@joleroi joleroi force-pushed the joleroi:nddata branch from 572e993 to 69101fc May 19, 2016

value.name = key
self.add_axis(value)
if data is not None:

This comment has been minimized.

@cdeil

cdeil May 19, 2016

Member

There's a problem here and a solution here. 😄

Axes: ``THETA``, ``ENERG``
Parameteres

This comment has been minimized.

@cdeil

cdeil May 19, 2016

Member

Typo: "Parameteres" -> "Parameters"

-----------
energy : `~astropy.units.Quantity`, `~gammapy.utils.BinnedDataAxis`
Energy axis (binned)
offset : `~astropy.units.Quantity`, `~gammapy.utils.DataAxis

This comment has been minimized.

@cdeil

cdeil May 19, 2016

Member

Missing backtick at the end of the line.

Can you please check the HTML docs for the new classes in this PR locally?

def from_table(cls, t):
"""This is a read for the format specified at
http://gamma-astro-data-formats.readthedocs.io/en/latest/irfs/effective_area/index.html#aeff-2d-format
The ``EFFAREA_RECO`` column is discarded.

This comment has been minimized.

@cdeil

cdeil May 19, 2016

Member

The general agreement for all our formats is that extra keywords and columns are allowed and should be ignored by science tools. So code that explicitly removes one extra column is bad.
Can't you make this work by just explicitly accessing the columns you need?

----------
vmin : `~astropy.units.Quantity`, float
Lowest value
emax : `~astropy.units.Quantity`, float

This comment has been minimized.

@cdeil

cdeil May 19, 2016

Member

emax -> vmax

@@ -0,0 +1,295 @@
import numpy as np

This comment has been minimized.

@cdeil

cdeil May 19, 2016

Member

Add the usual two lines of boilerplate to every file: license and future imports.

from ...datasets import gammapy_extra
from ...irf import EffectiveAreaTable, abramowski_effective_area
from ...utils.energy import EnergyBounds
from ...utils.testing import requires_dependency, requires_data, data_manager
@requires_data('gammapy-extra')
def test_EffectiveArea2D(tmpdir):
# These are just some quick and dirty tests they should be removed later

This comment has been minimized.

@cdeil

cdeil May 19, 2016

Member

Can you put real tests now?
This is where I would start the review.

It's not clear to me what is already implemented here and what isn't.

One thing that would be nice to have is a repr like here that gives an overview of the axes and make the basic plots work again, to show that this works:
https://github.com/gammapy/gammapy/blob/master/gammapy/scripts/data_show.py#L92

Another example of what a repr with the axis info could look like: https://github.com/gammapy/gammapy/blob/master/gammapy/cube/core.py#L538

I.e. implement the things that are useful for playing around and debugging this.

@cdeil

This comment has been minimized.

Member

cdeil commented May 19, 2016

I looked at this for 10 minutes, but I would need at least an hour to read through your implementation and make good suggestions. I can do this later if you want, but not this week.

Naively I would have expected some scheme where a given IRF is a subclass and has class-level attributes to declare which axes are present and maybe some other things.
This is how e.g. Django models work or Astropy models or Astropy coordinate frames.
See https://docs.djangoproject.com/en/1.9/topics/db/models/

The axes are class-specific and not instance-specific in your implementation, right?
Then probably the axis configuration declaration for a given IRF should probably be class-level attributes like in the Django model example I linked to, no?

(The alternative would be that there is a factory function to create specific IRFs or a single class that is configurable on construction.)

Does this comment make any sense? Or should I give a code example how I would have expected the AEFF class implementation to look like?

@cdeil

This comment has been minimized.

Member

cdeil commented May 19, 2016

This is an example of a single class that is configurable on construction:
http://xarray.pydata.org/en/stable/data-structures.html#creating-a-dataarray

For our IRFs I don't know if one configurable class or a sub-class zoo would work better.

@joleroi

This comment has been minimized.

Contributor

joleroi commented May 19, 2016

@cdeil thanks. you already helped a lot. Did I understand you correctly that you would have expected something like

class EffArea
    __init__(self, energy, offset, ...):
        self.energy = DataAxis(energy)
        self.offset = DataAxis(offset)

This is how I had it in mind originally. I changed to the 'weird' scheme because I have a from_table constructor in the base class (NDDataArray) that interferes with that (I already adds axes, so the subclasses cannot call this constructor). But as you say, the from_table methods should be more specific I guess (read a fixes set of columns, instead of trying to guess). I put it to the base class in order to be able to read/write any NDDataArray and not only the subclasses.

I will change this, and then add the plotting methods etc. to make the review easier, ok?

UPDATE: The NDDataArray has an axes attribute, that is use to set up the interpolator etc.This would still be present in the EffectiveArea2d. So the constructor would be more like

class EffArea
    __init__(self, energy, offset, ...):
         super.__init__()
        self.energy = self.axes[0]
        self.offset = self.axes[1]

@cdeil

This comment has been minimized.

Member

cdeil commented May 19, 2016

@joleroi - To be honest I don't know what I want / expect here.

The goal should be that the class is nice to use that that each individual IRF class is small, basically declarative which axes and binning are present, the boilerplate code should be in the base class.

Something like this:

In [3]: Disk2D.__init__??
Signature: Disk2D.__init__(self, amplitude=1, x_0=0, y_0=0, R_0=1, **kwargs)
Source:
            def __init__(self, *params, **kwargs):
                return super(cls, self).__init__(*params, **kwargs)

File:      ~/Library/Python/3.5/lib/python/site-packages/astropy-1.2.dev15393-py3.5-macosx-10.11-x86_64.egg/astropy/modeling/core.py
Type:      function

In [4]: Disk2D
Out[4]: 
<class 'astropy.modeling.functional_models.Disk2D'>
Name: Disk2D
Inputs: ('x', 'y')
Outputs: ('z',)
Fittable parameters: ('amplitude', 'x_0', 'y_0', 'R_0')

In [5]: from astropy.modeling.models import Disk2D

There the parameters (axes in your case) are class-level attributed (see here) and then some dark magic is generating __init__.

I now see that in Sherpa the parameters are constructed in __init__ (like you're constructing the axes in __init__ now:
https://github.com/sherpa/sherpa/blob/3dc0c33146ea5c93b0dcff07be804e25769d7642/sherpa/astro/models/__init__.py#L582

I'll try to think about this and read up on class-level attributes in Python over the weekend.

But yes ... adding tests or examples that show how your current class is used and that it works would be very helpful for review.

@joleroi

This comment has been minimized.

Contributor

joleroi commented May 20, 2016

@cdeil
I start to understand the 2 different schemes you propose. Right now I don't know how I should make the base class work with class level attributes. All the interpolation etc. work with calls to self.axes and loops. How could this be changed to use class level attributes like self.offset? I will take 10 min. to try to understand this 😄

Otherwise I will continue with the 'construction on initialization' scheme. If we encounter severe problems later, we have to find a way to switch to the other scheme.

@joleroi

This comment has been minimized.

Contributor

joleroi commented May 23, 2016

I worked in this with @adonath and we think the subclassing API is sufficiently elegant for now (not as fancy as astropy.models but it does the job, if someone has time it can always be improved). Axes are class level attributes defined in each subclass now.
https://github.com/gammapy/gammapy/pull/527/files#diff-cfb4e8b67c434385b17b6cd254d4c5b0R42
The order has to be defined in a separate attribute axis_names (astropy magically gets the order from the order the class level attributes are written in)

The drawback is that NDData cannot be instanciated anymore (since it does not have any axes defined).

@joleroi joleroi force-pushed the joleroi:nddata branch from f5df6bf to ffcc9c7 May 23, 2016

joleroi added some commits May 24, 2016

@joleroi joleroi force-pushed the joleroi:nddata branch from f735fae to 92fc4b7 May 25, 2016

@joleroi

This comment has been minimized.

Contributor

joleroi commented May 25, 2016

@cdeil I will merge this now. The example class EffectiveArea2d covers all the functionality from the old class EffectiveAreaTable2d but is not used anywhere. Next I will add a 1D effective area class based on NDDataArray. You can have a look next week and then I will remove the old irf.effective_area_table module and replace it with the new classes, if nobody raises any concerns.

@joleroi joleroi merged commit 7ae7728 into gammapy:master May 25, 2016

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@joleroi joleroi deleted the joleroi:nddata branch May 25, 2016

@cdeil cdeil changed the title from ND data base class for gammapy.irf to Add n-dim data base class for gammapy.irf Jun 9, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment