New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve MapCoord interface #1318

Merged
merged 27 commits into from Feb 22, 2018

Conversation

Projects
None yet
2 participants
@woodmd
Member

woodmd commented Feb 19, 2018

This PR refactors the MapCoord class and improves the interface for interacting with map objects via MapCoord objects (e.g. with get_by_coords or set_by_coords). Specifically it addresses some of the ideas and points raised in #1251. Some additional iteration will probably be required to figure out how to integrate this into a unified scheme for coordinate handling in gammapy but I think these changes to MapCoord could be a good starting point for that development.

A short summary of some the changes in this PR:

  • Renamed MapCoords to MapCoord.
  • MapCoord objects support labeled axes and use an OrderedDict internally to store coordinate values. Using OrderedDictas the internal data structure means that we have the option to preserve the axis ordering such as when converting from a coordinate tuple. __getitem__ accepts either integer or string-based axis indexing (e.g. either coord[0] or coord['lon']). Attribute-based access is also supported (coord.lon).
  • Added support to by_coords methods for dict input in addition to tuple and MapCoord. Where the internal logic of the map/geometry classes depends on the axis ordering a MapGeom.coord_to_tuple method is used to extract a coordinate tuple from a MapCoord object with an axis ordering matched to the geometry. Note that when using dict or MapCoord input this requires a strict match between the axis name of the map and coordinate object (energy would not match reco_energy for instance). We might consider whether it might be better to match on the type property proposed in #1317.
  • by_coords methods now correctly handle celestial/galactic coordinate transformations when working with SkyCoord objects. For the tuple-based access pattern the SkyCoord replaces the longitude and latitude arrays. For dict-based access the SkyCoord should be passed with key skycoord. Here's an example:
from gammapy.maps import WcsND, MapAxis
from astropy.coordinates import SkyCoord

axes=[MapAxis.from_bounds(100.,10000.,3,name='energy')]
m = WcsNDMap.create(npix=10,skydir=(0.0,0.0), axes=axes)

skycoord, energy = SkyCoord(0.0,0.0,unit='deg'), 1000.
# Access by tuple
m.get_by_coords((skycoord,energy))
# Access by dict
m.get_by_coords(dict(skycoord=skycoord,energy=energy))
  • Drop interp option for get_by_coords. This was just dispatching to interp_by_coords so it seemed better just to have users call this method explicitly.

@cdeil cdeil self-assigned this Feb 20, 2018

@cdeil cdeil requested review from joleroi and registerrier Feb 20, 2018

@cdeil cdeil added this to the 0.7 milestone Feb 20, 2018

@cdeil

This comment has been minimized.

Member

cdeil commented Feb 20, 2018

@woodmd - Thanks for this huge work.

I've started to look at this, but haven't gotten very far.

Just general / superficial comments:

  • there are no docs examples of MapCoord use? Can you add some either in the class docstring in an examples section, or in the high-level docs somewhere, and refer to that section from the MapCoord class docstring?

  • there are a few methods and if branches that are never executed by tests:

time python setup.py test -V -P maps --coverage

If you can put some more tests that show that the code currently works and establish behaviour, that would be very helpful for anyone continuing to develop with maps.

I'll have a closer look at this tomorrow morning.

@registerrier or @joleroi - do you have time to review this soon or should we merge and you try from master in the coming days / weeks?

@woodmd

This comment has been minimized.

Member

woodmd commented Feb 20, 2018

I've tried to improve both docs and tests a bit. As usual the docs are a WIP and could definitely be improved. The most valuable input as this point would be to know if this interface satisfies the use cases foreseen for maps in gammapy. Note that here I haven't really given much thought to how quantities should be handled -- I think this is one of the last missing pieces needed for a complete design. Along these lines we may want to consider having unit properties that live on the MapCoord object.

there are a few methods and if branches that are never executed by tests

These should be mostly covered now. The remaining coverage misses are mostly exceptions -- I'm not sure if it's worth the effort to add explicit tests for these.

woodmd and others added some commits Feb 21, 2018

coordsys='GAL', axes=[energy_axis])
m.set_by_coord( (skycoord, energy), [0.5, 1.5] )
m.get_by_coord( (skycoord, energy) )

This comment has been minimized.

@cdeil

cdeil Feb 21, 2018

Member

@woodmd - I've pushed some small fixes in f7e10bf .

This one maps docs example still errors out and I fail to see why!?

In [31]: %paste
   import numpy as np
   from astropy.coordinates import SkyCoord
   from gammapy.maps import Map, MapCoord, MapAxis

   lon = np.array([0.0,1.0])
   lat = np.array([1.0,2.0])
   energy = np.array([100., 1000.])
   energy_axis = MapAxis.from_bounds(100., 1E5, 12, interp='log', name='energy')
   
   skycoord = SkyCoord(lon, lat, unit='deg', frame='galactic')   
   m = Map.create(binsz=0.1, map_type='wcs', width=10.0,
                  coordsys='GAL', axes=[energy_axis])

   m.set_by_coord( (skycoord, energy), [0.5, 1.5] )
   m.get_by_coord( (skycoord, energy) )
## -- End pasted text --
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-31-07934e830df0> in <module>()
     13 
     14 m.set_by_coord( (skycoord, energy), [0.5, 1.5] )
---> 15 m.get_by_coord( (skycoord, energy) )

~/code/gammapy/gammapy/maps/base.py in get_by_coord(self, coords)
    485         vals = np.empty(coords.shape, dtype=self.data.dtype)
    486         coords = coords.apply_mask(msk)
--> 487         idx = self.geom.coord_to_idx(coords)
    488         vals[msk] = self.get_by_idx(idx)
    489         vals[~msk] = np.nan

~/code/gammapy/gammapy/maps/geom.py in coord_to_idx(self, coords, clip)
   1045         pix = self.coord_to_pix(coords)
   1046         return self.pix_to_idx(pix, clip=clip)
-> 1047 
   1048     @abc.abstractmethod
   1049     def pix_to_coord(self, pix):

~/code/gammapy/gammapy/maps/wcs.py in coord_to_pix(self, coords)
    486             return tuple([np.array([]) for i in range(coords.ndim)])
    487 
--> 488         c = self.coord_to_tuple(coords)
    489         # Variable Bin Size
    490         if self.axes and self.npix[0].size > 1:

~/code/gammapy/gammapy/maps/geom.py in coord_to_tuple(self, coord)
   1185             coord_tuple += [coord[ax.name]]
   1186 
-> 1187         return coord_tuple
   1188 
   1189     @abc.abstractmethod

~/code/gammapy/gammapy/maps/geom.py in __getitem__(self, key)
    629         if isinstance(key, six.string_types):
    630             return self._data[key]
--> 631         else:
    632             return list(self._data.values())[key]
    633 

KeyError: 'energy'

This comment has been minimized.

@woodmd

woodmd Feb 21, 2018

Member

This occurs due to a bug in apply_mask. I'm committing a fix now.

Maps support interpolation via the `~Map.get_by_coords` and
`~Map.get_by_pix` methods. Currently the following interpolation
Maps support interpolation via the `~Map.interp_by_coord` and
`~Map.interp_by_pix` methods. Currently the following interpolation

This comment has been minimized.

@cdeil

cdeil Feb 21, 2018

Member

This link to Map.interp_by_pix doesn't resolve from the docs.
@woodmd - What is the thing to do here?
Add an abstract method to the base class?

This comment has been minimized.

@woodmd

woodmd Feb 21, 2018

Member

Yes having a base class method probably makes the most sense and would be parallel to what we have for the other accessor methods. Adding this now.

@cdeil

This comment has been minimized.

Member

cdeil commented Feb 21, 2018

@woodmd - I love that one can now interact with coordinates and axes by name!

I'd still like to play with this a little more and compare to
http://xarray.pydata.org/en/stable/indexing.html
(but from my side, could also merge now, and I play with it in master, so other PRs can build on this)

Yes, one important question is whether to use quantity or how to support it. Astropy went all-in on the user of Quantity / Angle objects (see also http://astropy-healpix.readthedocs.io/ as an example). I think this would be good. But I don't understand yet of what it would mean here exactly or how much work it would be to add quantity / angle support in the interface or even in the internal data structure (probably via a "unit" string" and ".quantity" attribute like we discussed for axes and data before).

I'm not sure the "axis0" names that are generated are a great idea. I guess they can't be avoided if we want to keep supporting the tuple interface? Personally I'd be prefer just always pass dict or MapCoord and never have such default names created. But I think @woodmd you really want to keep the tuple interface, right? Does xarray have the same problem? How do they solve it?

My last main thought is that, yes, more docs examples would be nice. But it's already very good now; I would say even more useful / important would be to have a repr for MapCoords, MapAxis and maybe even MapGeom. I only played with the examples for 5 min, but that was always what I wanted to see: what's in these objects (especially the coord / axis names).

@cdeil

This comment has been minimized.

Member

cdeil commented Feb 21, 2018

One more thought: should we really expose coords as attributes, i.e. c.lon in addition to c['lon']?
I guess the motivation is have tab completion and the convenience of not typing / reading the [''] characters?
The problem with it is that now there's two ways to do the same thing, half of the callers will use one way, half the other. And there's the potential for name collisions / problems, because two logical namespaces are mixed: the coord names and the class attributes / methods.
I think e.g. pandas had the same thing where they exposed columns via df.column_name but regretted doing it.

Bottom line: I think I'm +1 to remove this feature for now unless @woodmd you're very sure it's a good idea, because it's easy to add later, but impossible to ever remove without braking all code using gammapy.maps.

@woodmd

This comment has been minimized.

Member

woodmd commented Feb 21, 2018

Yes, one important question is whether to use quantity or how to support it.

One thing to understand would be the performance implications. I've heard various claims about the overhead of quantities but never tested it myself. If we don't want to deal with quantities internally then one option would be to strip off the units in the constructor and save them in a separate units dict.

I'm not sure the "axis0" names that are generated are a great idea. I guess they can't be avoided if we want to keep supporting the tuple interface? Personally I'd be prefer just always pass dict or MapCoord and never have such default names created. But I think @woodmd you really want to keep the tuple interface, right? Does xarray have the same problem? How do they solve it?

I'd prefer to keep the tuple-based interface in place for now. Note that while coord methods support named arguments the pix and idx methods do not. IMO it's nice to have the same interface for all three argument types. The documentation would also need to be reworked quite a bit if we dropped coordinate tuples.

One more thought: should we really expose coords as attributes, i.e. c.lon in addition to c['lon']?
I guess the motivation is have tab completion and the convenience of not typing / reading the [''] characters?

This was purely for convenience but I admit that I haven't given much thought to larger implications of supporting this. I'd be fine with dropping for now. However I'd prefer to keep lon and lat attributes. These are hard-coded so shouldn't raise any issues with collisions.

@cdeil

This comment has been minimized.

Member

cdeil commented Feb 22, 2018

@woodmd - I did a small docstring example fix in 6984de0 . How do you feel about using Python lists in examples instead of Numpy arrays? Should be fully supported, no? I prefer it because it is a bit simpler.

I see that .energy is still used in some high-level docs examples. I think that will no longer work now that __getattr__ is removed. I can try the examples once more and fix those.

One more naive question: how does use with dict or OrderedDict differ? Is it explained in the docs? Are the different tests for dict / OrderedDict useful?

@woodmd

This comment has been minimized.

Member

woodmd commented Feb 22, 2018

How do you feel about using Python lists in examples instead of Numpy arrays? Should be fully supported, no? I prefer it because it is a bit simpler.

Fine with me. Lists should be fully supported although since we don't explicitly include them in the tests there may be a few places where we we're missing a cast to numpy arrays.

I see that .energy is still used in some high-level docs examples. I think that will no longer work now that getattr is removed. I can try the examples once more and fix those.

Yes those should be removed. Were you going to take care of removing __getattr__?

One more naive question: how does use with dict or OrderedDict differ? Is it explained in the docs? Are the different tests for dict / OrderedDict useful?

With the present implementation there should be no difference between the two when used with maps since the axes will be matched by name. OrderedDict input gives you the option of setting both index and name of axes in the MapCoord object. At the moment there's no real use for this functionality in the context of maps but I thought there may eventually be other applications where you might want to control both.

if self.ndim != coord.ndim:
raise ValueError
if not coord._match_by_name:

This comment has been minimized.

@cdeil

cdeil Feb 22, 2018

Member

@woodmd - PyCharm complains here that you're accessing the private MapCoord._match_by_name from outside the class (here from MapGeom.coord_to_tuple). Seems like a valid complaint? What would you suggest here?

This comment has been minimized.

@woodmd

woodmd Feb 22, 2018

Member

Yes this was just laziness on my part. I'll make match_by_name into a public attribute.

@classmethod
def from_tuple(cls, coords, **kwargs):
def _from_tuple(cls, coords, coordsys=None, copy=False):

This comment has been minimized.

@cdeil

cdeil Feb 22, 2018

Member

copy is unused here. Should be passed along to the _from_lonlat calls, no?

This comment has been minimized.

@woodmd

woodmd Feb 22, 2018

Member

Yes copy should get passed through. Fixing now.

@cdeil

cdeil approved these changes Feb 22, 2018

Looks good to me. @woodmd - Merge now?

@woodmd

This comment has been minimized.

Member

woodmd commented Feb 22, 2018

Yes looks good to me as well. Merging now.

@woodmd woodmd merged commit a76262d into gammapy:master Feb 22, 2018

0 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
continuous-integration/appveyor/pr AppVeyor build failed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment