New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add slice_by_idx methods to gammapy.maps #1443

Merged
merged 8 commits into from Jun 28, 2018

Conversation

2 participants
@adonath
Member

adonath commented Jun 26, 2018

This PR adds .slice_by_idx() methods to the MapGeom and Map classes. I'm opening this PR early to get some feedback on the implementation and API, before I continue with .slice_by_coord(). Does the drop_axes keyword makes sense to you (this was introduced by @woodmd in Map.to_slice()) or shall we rather implement a Map.drop_axis() method? I introduced a hidden _copy_init_kwargs property to WcsGeom and HpxGeom to easily create a new WcsGeom and HpxGeom object with the same specification. Or shall we rather introduce a WcsGeom.copy() method?

@adonath adonath added the feature label Jun 26, 2018

@adonath adonath added this to the 0.8 milestone Jun 26, 2018

@adonath adonath self-assigned this Jun 26, 2018

@adonath adonath added this to To do in Map analysis via automation Jun 26, 2018

@adonath adonath requested review from cdeil and registerrier Jun 26, 2018

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Jun 27, 2018

Member

@adonath - Here's what I think:

before I continue with .slice_by_coord()

Suggest to only do this later, in a follow-up PR, maybe. Slicing by coord requires rounding and is thus an extra can of worms because probably we need to offer options like "nearest", "inclusive", "left".

For now, I think we should leave it to the user to go to idx integers for indexing and slicing, and just show via a docs example how to obtain those integers from coords (e.g. in an example with an energy axis and assuming one wants to have the slice that fully contains a given (emin, emax).

I think adding slice_by_coord is not high priority at all.

Does the drop_axes keyword makes sense to you?

No.

I think instead we should support both int index and slice range in map.slice_by_idx, just like in Numpy where you use [] for both indexing and slicing:

>>> import numpy as np
>>> data = np.zeros((10, 20, 30))
>>> data.shape
(10, 20, 30)
# If you want slice 5 and lower-dimensional data, use int 5
>>> data[:,:,5].shape
(10, 20)
# If you want to keep the dimension, slice like this
>>> data[:,:,5:6].shape
(10, 20, 1)

You're calling np.squeeze in the current drop_axes implementation. Like I said before, I think that's error-prone and we should never do that. Can you please add a test case of maps that only have one bin in lon, lat or energy to make sure we allow that and don't squeeze out dimensions accidentally?

Now the question is how to support this in the API and implementation. For coord and index @woodmd started with tuples, following the example of Numpy, and then we added dict and MapCoord for named axes support similar to http://xarray.pydata.org/ and having something like a WCS on the extra axes. That's visible in the examples here:
http://docs.gammapy.org/dev/maps/index.html#interface-with-mapcoord-and-skycoord
I think there's agreement that moving in the direction of always going to MapCoord on method call input and using that internally is a good idea, no?

How about we try to do the same and add a MapSlice class which has only ~10 lines for now and looks something like this?

class MapSlice:
    def __init__(self, slice):
        self.slice = slice

    @classmethod
    def from_any(cls, slice):
        # only support dict for now,
        # maybe add tuple or MapCoord or spatial axes support later
        return cls(slice)

    def to_data(self):
        slice = [...]  # ignore spatial axes (not sure if ellipses is the way to do it)
        return (..., *self.slice.values())

class Map:
    def slice_by_idx(self, slices):
        map_slice = MapSlice.from_any(slices)
        geom = self.geom.slice_by_idx(map_slice.to_geom())
        data = self.data[map_slice.to_data()]

Given that both geom.slice_by_idx and map_slice.to_geom is under our control, we really only need one of the two.
The main thing would be map_slice.to_data which transforms the nD slice info into what Numpy expects.

There's a line missing that sorts slices to match the axis order. That could be done e.g. by passing self.geom.axes to MapSlice.from_any and doing it there, or just pass the names.

This would allow the following ways to slice, which is how I like to do it (very explicit and by name):

map.slice_by_idx({'energy': slice(3, 7), 'time': 42)

I think this might be a simple and extensible way to do it?

I introduced a hidden _copy_init_kwargs property to WcsGeom and HpxGeom to easily create a new WcsGeom and HpxGeom object with the same specification. Or shall we rather introduce a WcsGeom.copy() method?

Maybe add both? Having an explicit list of attrs that need to be dealt with e.g. when copying or saving or doing something else can be nice. And having a quick way to copy is also very nice. We'll call that ~ 10 times in the gammapy.maps code ourselves, so I don't think we will want to access _copy_init_kwargs 10 times, because it's more complex than just a method call (where you can add options concerning the data copy behaviour), no?

Suggest to move that discussion to a separate PR or issue, or continue it in the existing #513 though.

Member

cdeil commented Jun 27, 2018

@adonath - Here's what I think:

before I continue with .slice_by_coord()

Suggest to only do this later, in a follow-up PR, maybe. Slicing by coord requires rounding and is thus an extra can of worms because probably we need to offer options like "nearest", "inclusive", "left".

For now, I think we should leave it to the user to go to idx integers for indexing and slicing, and just show via a docs example how to obtain those integers from coords (e.g. in an example with an energy axis and assuming one wants to have the slice that fully contains a given (emin, emax).

I think adding slice_by_coord is not high priority at all.

Does the drop_axes keyword makes sense to you?

No.

I think instead we should support both int index and slice range in map.slice_by_idx, just like in Numpy where you use [] for both indexing and slicing:

>>> import numpy as np
>>> data = np.zeros((10, 20, 30))
>>> data.shape
(10, 20, 30)
# If you want slice 5 and lower-dimensional data, use int 5
>>> data[:,:,5].shape
(10, 20)
# If you want to keep the dimension, slice like this
>>> data[:,:,5:6].shape
(10, 20, 1)

You're calling np.squeeze in the current drop_axes implementation. Like I said before, I think that's error-prone and we should never do that. Can you please add a test case of maps that only have one bin in lon, lat or energy to make sure we allow that and don't squeeze out dimensions accidentally?

Now the question is how to support this in the API and implementation. For coord and index @woodmd started with tuples, following the example of Numpy, and then we added dict and MapCoord for named axes support similar to http://xarray.pydata.org/ and having something like a WCS on the extra axes. That's visible in the examples here:
http://docs.gammapy.org/dev/maps/index.html#interface-with-mapcoord-and-skycoord
I think there's agreement that moving in the direction of always going to MapCoord on method call input and using that internally is a good idea, no?

How about we try to do the same and add a MapSlice class which has only ~10 lines for now and looks something like this?

class MapSlice:
    def __init__(self, slice):
        self.slice = slice

    @classmethod
    def from_any(cls, slice):
        # only support dict for now,
        # maybe add tuple or MapCoord or spatial axes support later
        return cls(slice)

    def to_data(self):
        slice = [...]  # ignore spatial axes (not sure if ellipses is the way to do it)
        return (..., *self.slice.values())

class Map:
    def slice_by_idx(self, slices):
        map_slice = MapSlice.from_any(slices)
        geom = self.geom.slice_by_idx(map_slice.to_geom())
        data = self.data[map_slice.to_data()]

Given that both geom.slice_by_idx and map_slice.to_geom is under our control, we really only need one of the two.
The main thing would be map_slice.to_data which transforms the nD slice info into what Numpy expects.

There's a line missing that sorts slices to match the axis order. That could be done e.g. by passing self.geom.axes to MapSlice.from_any and doing it there, or just pass the names.

This would allow the following ways to slice, which is how I like to do it (very explicit and by name):

map.slice_by_idx({'energy': slice(3, 7), 'time': 42)

I think this might be a simple and extensible way to do it?

I introduced a hidden _copy_init_kwargs property to WcsGeom and HpxGeom to easily create a new WcsGeom and HpxGeom object with the same specification. Or shall we rather introduce a WcsGeom.copy() method?

Maybe add both? Having an explicit list of attrs that need to be dealt with e.g. when copying or saving or doing something else can be nice. And having a quick way to copy is also very nice. We'll call that ~ 10 times in the gammapy.maps code ourselves, so I don't think we will want to access _copy_init_kwargs 10 times, because it's more complex than just a method call (where you can add options concerning the data copy behaviour), no?

Suggest to move that discussion to a separate PR or issue, or continue it in the existing #513 though.

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Jun 27, 2018

Member

Another comment: can you please add a few more explicit tests, e.g. of a 4D map with event_id = [7, 9, 42] and energy = [3, 5, 7, 9], maybe even with a single spatial bin?

One thing to assert on is the output shape when slicing, but also checking that the sliced map data is a view would be important, so that we don't accidentally make copies. I've never done this, but looking at https://stackoverflow.com/a/11524746/498873 I think this might do it:

m2 = m.slice(...)
assert m2.data.base is m.data
Member

cdeil commented Jun 27, 2018

Another comment: can you please add a few more explicit tests, e.g. of a 4D map with event_id = [7, 9, 42] and energy = [3, 5, 7, 9], maybe even with a single spatial bin?

One thing to assert on is the output shape when slicing, but also checking that the sliced map data is a view would be important, so that we don't accidentally make copies. I've never done this, but looking at https://stackoverflow.com/a/11524746/498873 I think this might do it:

m2 = m.slice(...)
assert m2.data.base is m.data
@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Jun 27, 2018

Member

If you don't think a MapSlice is a good idea, because it's only used in one place, and doesn't have enough code to warrant splitting it out into a separate class, fine.

My main suggestion is to support this:

map.slice_by_idx({'energy': slice(3, 7), 'time': 42)

and to drop the drop_axes from the slice method on map and geom.

Member

cdeil commented Jun 27, 2018

If you don't think a MapSlice is a good idea, because it's only used in one place, and doesn't have enough code to warrant splitting it out into a separate class, fine.

My main suggestion is to support this:

map.slice_by_idx({'energy': slice(3, 7), 'time': 42)

and to drop the drop_axes from the slice method on map and geom.

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Jun 28, 2018

Member

@adonath - I think you added a new "Indexing and Slicing" section in the docs, but there was already an empty one further down on that docs page:
http://docs.gammapy.org/dev/maps/index.html#slicing-methods
Not sure whether it fits better further up or down, but suggest to remove the empty one.

Otherwise, 👍 , this is great!

Member

cdeil commented Jun 28, 2018

@adonath - I think you added a new "Indexing and Slicing" section in the docs, but there was already an empty one further down on that docs page:
http://docs.gammapy.org/dev/maps/index.html#slicing-methods
Not sure whether it fits better further up or down, but suggest to remove the empty one.

Otherwise, 👍 , this is great!

@cdeil

You could give one example in the docs where you show how to compute an indx from a given energy coord. It's pretty common that people will want to do that, and it's possible now with ~ 2 lines of computing the idx first.

Show outdated Hide outdated docs/maps/index.rst
@@ -499,6 +499,32 @@ def upsample(self, factor, order=0, preserve_counts=True):
"""
pass
def slice_by_idx(self, slices, copy=False):
"""Slice sub map from map object.

This comment has been minimized.

@cdeil

cdeil Jun 28, 2018

Member

Add reference to the high-level docs section with the examples?

@cdeil

cdeil Jun 28, 2018

Member

Add reference to the high-level docs section with the examples?

This comment has been minimized.

@adonath

adonath Jun 28, 2018

Member

Done

@adonath
@adonath

This comment has been minimized.

Show comment
Hide comment
@adonath

adonath Jun 28, 2018

Member

@cdeil Thanks for your comments! For now I decided against the MapSlice object, but we should keep the idea in the back of our head, once we introduce the .slice_by_coord() method. But still the argument remains, that it will be only used in one place...

I'll go ahead an merge, the failing test is unrelated.

Member

adonath commented Jun 28, 2018

@cdeil Thanks for your comments! For now I decided against the MapSlice object, but we should keep the idea in the back of our head, once we introduce the .slice_by_coord() method. But still the argument remains, that it will be only used in one place...

I'll go ahead an merge, the failing test is unrelated.

@adonath adonath merged commit de9f476 into gammapy:master Jun 28, 2018

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details

Map analysis automation moved this from To do to Done Jun 28, 2018

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Jun 28, 2018

Member

🍴 Hooray! 🍴

Member

cdeil commented Jun 28, 2018

🍴 Hooray! 🍴

@cdeil cdeil changed the title from Add `.slice_by_idx()` methods to gammapy maps to Add slice_by_idx methods to gammapy.maps Aug 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment