New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observation table subset selection #295

Merged
merged 19 commits into from Aug 4, 2015

Conversation

Projects
None yet
2 participants
@mapazarr
Member

mapazarr commented Jul 1, 2015

Added methods for filtering observation tables to the ObservationTable class. The method allows for box or circle selections on any variable inside the observation table. In addition, zenith is interpreted as (90 deg - altitude). Inverse selection is also possible. The doc is uploaded here:
https://googledrive.com/host/0BzXHZQx0oCLBfmcyUDdxMU9sLVduMXM1QXk5RTNYU3dXYmthVG9OZ18wWE1uR3lYaEsxRU0/html_doc/docs/_build/html
This PR addresses issue #293.
TODO:

  • revise/implement methods for selecting regions in the sky.
  • implement inline 'findruns` command
  • use logging instead of print statements in 'findruns`
  • implement tests for 'findruns`

@cdeil cdeil added the feature label Jul 1, 2015

@cdeil cdeil added this to the 0.3 milestone Jul 1, 2015

@cdeil cdeil self-assigned this Jul 1, 2015

@mapazarr

This comment has been minimized.

Member

mapazarr commented Jul 1, 2015

I have seen the travis-ci error because of warnings in the sphinx-doc. I'll look into it tomorrow.

@mapazarr

This comment has been minimized.

Member

mapazarr commented Jul 1, 2015

Ok. I fixed the warnings. The travis-ci build looks fine.

def test_filter_observations():
# create random observation table
observatory_name='HESS'

This comment has been minimized.

@cdeil

cdeil Jul 2, 2015

Member

Please put observatory_name='HESS' and n_obs=10 as default keyword arguments for make_test_observation_table.
So that if the user doesn't look up and give any parameters, they still get a nice small table to play with.

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

# test box selection in alt
variable = 'ALT'
value_min = Angle(Angle(70., 'degree').radian, 'radian')

This comment has been minimized.

@cdeil

cdeil Jul 2, 2015

Member

Use the to method ... it's a bit simpler and does the same thing (give you an Angle object back):

Angle(42, 'deg').to('radian')

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Fixed. I removed the conversion (not needed).

@cdeil

This comment has been minimized.

Member

cdeil commented Jul 2, 2015

The main additions in this PR are the ObservationTable.filter_observations_var_range and ObservationTable.filter_observations methods.
I have to say I don't like the current methods ... in this case it would have been better for us to discuss the API in detail before you start implementing something, now you'll mostly have to rewrite this. Sorry.

We can discuss in the hangout tonight, but here's my first thoughts / suggestions:

  • Stop working on this now ... finish #292 first. You've worked on it for two weeks, but it still needs significant work and I don't think you'll be more productive by working on two things simultaneously.
  • Include the implementation of the find_obs command line tool in this PR! Having a very concrete use case for the observation table selection methods you implement here will help us find a good API. The devil is in the detail ... how to go from command line arguments to Astropy time and angle and ... objects and how to feed those into the selection methods.
  • What I don't like about the current methods is that they are so complicated, handing various special cases with if statements. Having said that ... I don't have a 100% complete concrete API proposal ye ... we need to think and discuss a bit more.
  • For the sky box and cone selection, I think splitting this in separate methods like I did in the EventList class would be best ... that should probably be done via a mixin class or by calling an external function, to avoid re-implementing box and cone selection.

What's most unclear to me is how to best implement the min / max selection that's currently in filter_observations_var_range:

  • One option I think is not needed is the invert to select outside the (min, max) range ... that can simply be removed, or do you see a use case?
  • There's the generic way of just using numpy boolean indexing like t[(34 < t['ALT']) & (t['ALT'] < 50)] ... so in a sense we don't even need to implement it. For time and zenith it doesn't work, so for those you could add a special method like I did for EventList.select_energy? I think that would be an OK solution ... have the few common parameters people make selections on or that can't be computed trivially like time special-cased, and in the docs show an example that other cuts can be done with a mask computed with a one-line numpy expression.

Some other possibilities we could discuss a bit, but I think are probably too complex to hash out and implement in a short amount of time (this should take a few days at most, not weeks):

  • Implement something like the pandas DataFrame.query method, or just wrap it by calling to_pandas and from_pandas on the astropy.table.Table. I'm not sure how useful it is, because our columns have units which would be ignored (unless you want to implement your own parser and add it as an astropy.table.Table.query method).
  • In astroplan we're representing constraints as objects, like TimeWindow (see here). So here, we could have SkyConeSelection, SkyBoxSelection, TimeWindow, ... and then a user could pass lists of such selections to Observation_table.select. That would split out the computations you have now into manageable, re-usable pieces, and we could implement representations of such selections as dicts and lists, which we could then write / read in JSON format, i.e. we have a serialisation, which opens up the possibility to store the selections in a log file, or if someone implements a web app in the future where the user can select data and events, the Javascript frontend can send the selection in that format to the server, which applies it and sends back the list.

So bottom line ... I think those last two fancy selection methods would be nice, but we don't have the time and they are probably too complex for the task, and should implement the simple solution I described before.
@mapazarr – Let me know what you think and let's discuss in the hangout tonight.

@mapazarr

This comment has been minimized.

Member

mapazarr commented Jul 2, 2015

Ok I see your point with the complexity of the method. I still think that having a generic way of making cuts on any variable is nice, without the need of implementing a new method each time. In addition, if something changes in the way of treating variables, one has to update only one function, not many (with the risk of forgetting updating some).
In addition, I thought this was something you wanted to have, according to the following TODO in the code:
https://github.com/gammapy/gammapy/pull/295/files#diff-77e060364d7c4d7526826a9804fdf2b0L229

I think it still would be possible to use the code, making it a bit simpler, by splitting some cases. The complexity of the code is introduced by:

  1. The time variables, that are not well casted into quantity objects.
  2. The special case for the zenith angle.
  3. The 2D selections for sky regions. For this I created the sky_box or sky_circle keywords, in order to treat them separately.

To fix 1. one could do the same as for 3.: use a separate keyword time_box or time_circle and make an independent method.

To fix 2. we can either drop the case of the zenith angle (and the user should use altitude) or transform it at an earlier stage.

Another complexity in the case of time variables is the need to support 2 different formats:

  • Absolute time strings.
  • Relative times w.r.t. the time reference in the table header (MET).

You already mentioned a couple of weeks ago that we should support both. If using a separate case for time variables, this complexity would become clearer, since it would not be nested with the zenith angle special case.

And yet another complexity in the case of time variables: when making a box selection in time, you need to check 2 variables, not only 1:

  • TIME_START
  • TIME_STOP

The observation (start and end) should be within the selected time interval. One could drop this use case, and the user will have to decide if he wants to apply the cut to either TIME_START or TIME_STOP, or apply the selection twice. But I think it would be nice to have the flexibility of doing also the cut on both variables with only one call to the method. Again, using a separate case for time variables would help clarify this.

So I propose to split the code a bit more, but keeping the generality for allowing cuts in any variable. The splitting would be according to the variable type, rather than the variable name.

To finalise, the code works properly, as demonstrated by the many test cases implemented in the test file. One can show them as examples for the user.

@mapazarr

This comment has been minimized.

Member

mapazarr commented Jul 2, 2015

BTW, for the inline command
https://github.com/gammapy/gammapy/blob/master/gammapy/scripts/find_obs.py
one can make it work with the functions I wrote. Since it does nothing right now it shouldn't be hard to make it work. I can do it within this PR next week if you'd like to.

@mapazarr

This comment has been minimized.

Member

mapazarr commented Jul 2, 2015

For the invert method I see a use case that I'll be using: For building the bg models I'd like to exclude runs within the Galactic plane, or too close to a source. It's pretty simple to do so by taking a list of all observations and doing inverted selections on a box enclosing the galactic plane and on circles enclosing the sources of a catalogue (for instance TeVCAT).

@mapazarr mapazarr force-pushed the mapazarr:find_observations branch 2 times, most recently from 5312132 to c069d22 Jul 30, 2015

@mapazarr

This comment has been minimized.

Member

mapazarr commented Aug 1, 2015

Ok, I did a major revision of the whole filtering process. I tried to simplify it as much as I could, and document it well. Now there are 4 different kinds of selections:

  • sky circle and sky box, as regions in the sky.
  • time intervals
  • intervals on generic variables of the obs table that can be casted into a Quantity

I also started to implement the inline find-obs tool. It is not 100% functional but it works. You can test it with the following commands:

gammapy-find-obs -h
gammapy-find-obs dummy.fits
gammapy-find-obs dummy.fits out.dummy.fits –overwrite
gammapy-find-obs dummy.fits –x 130 –y -40 –r 50 –system ‘icrs’
gammapy-find-obs dummy.fits –x 0 –y 0 –r 50 –system ‘galactic’
gammapy-find-obs dummy.fits –x 225 –y -25 –dx 75 –dy 25 –system ‘icrs’
gammapy-find-obs dummy.fits –x -25 –y 0 –dx 75 –dy 25 –system ‘galactic’
gammapy-find-obs dummy.fits –t_start ‘2012-01-01T00:00:00’ –t_stop ‘2014-01-01T00:00:00’
gammapy-find-obs dummy.fits –par_name ‘OBS_ID’ –par_min 2 –par_max 6
gammapy-find-obs dummy.fits –par_name ‘ALT’ –par_min 60 –par_max 70

Please have a look at the docs, especially at:

@cdeil I still need to work on this, but please take some time to review the general structure of the code, to see if you like it, and best plan how to move on.

Please note, that I find the generic variable filtering method very useful for the construction of the observation groups for the binning of the observations for producing the cube bg models.

@cdeil

This comment has been minimized.

Member

cdeil commented Aug 1, 2015

Wow, you've been busy. 😄

It'll take me a few hours to review this, play with this and think if this is the best API / implementation or if it can be improved somehow. I guess you'll continue with this on Monday and I can take my time?

If you do want to continue with this, my suggestion would be that you work on the gammapy-find-obs tool to make it easy for me to start using it, i.e.:

  • How can I quickly create the test obs table file you're using for all the unit tests? Maybe give a one-line python -c 'from gammapy.datasets import ... command or even make this an option gammapy-find-obs --dump-test-table.
  • Can you make it possible to read / write ECSV files? This will make it easier for me to quickly read the input / output files. And many users will prefer a text file format for observation lists I think.
  • Start implementing the examples you give in the docs and the GH comment above as tests. Probably it's enough to have one or max two observation tables that are used for all the tests and the same one for the docs?
  • Move some of the larger TODOs from the code into the task list at the top of this GH issue. This will make it easier for you / me to gauge if this takes a few more days or more than a week and whether we want to split some things off to future PRs.

One concern I have is that we want very similar selection methods for event lists and for the sky regions, for source catalogs, so the code for that should be shared somehow. Not sure what the solution for this is yet.

It works with `fits` files as input/output.
Examples:

This comment has been minimized.

@cdeil

cdeil Aug 1, 2015

Member

Make this a sub-section, so that it gets a URL one can link to from emails or other parts of the Sphinx docs.

Examples
-------------

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

Please implement this suggestion (make Examples a RST sub-section so that it can be linked to)

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

$ gammapy-find-obs all_obs.fits alt_70_to_90_deg_obs.fits \
--par_name 'ALT' --par_min 70 --par_max 90
Using ``gammapy-find-obs`` is easy, you don't have to remember all

This comment has been minimized.

@cdeil

cdeil Aug 1, 2015

Member

Start the examples section with this, saying something like "gammapy-find-obs has many options and here we'll only show a few common examples. A full list of available options is available via `--help``.

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

Please implement the suggestion from this in-line comment.

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

at the command line or read the usage help (TODO: add link here).
.. note:: Noel Dawe's ``goodruns`` tool for `ATLAS <http://atlas.ch>`__ run selection

This comment has been minimized.

@cdeil

cdeil Aug 1, 2015

Member

Move this to the top, it doesn't belong in the examples section.
I'll have to have a look again whether this is useful info, probably we'll remove this reference, but for now, please keep it.

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

Please implement the suggestion from this in-line comment.

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

if not inverted:
mask = (value_min < value) & (value < value_max)
else:
mask = (value_min >= value) | (value >= value_max)

This comment has been minimized.

@cdeil

cdeil Aug 1, 2015

Member

Instead of a separate expression for the inverse mask, I think this is a bit easier:

mask = (value_min < value) & (value < value_max)

if inverted:
    mask = np.invert(mask)

Use the same pattern for the other functions where you have an inverted option.

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

Please apply this in-line comment.

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

There are 3 main kinds of selection criteria, according to the
value of the `type` keyword in the `selection` dictionary:
- sky regions (boxes or circles)

This comment has been minimized.

@cdeil

cdeil Aug 1, 2015

Member

I think you can de-dent this list by four spaces.

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

Please apply this in-line comment.

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

Allowed selection criteria are interpreted using the following
keywords in the `selection` dictionary:
- `type`: ``sky_box``, ``sky_circle``, ``'time_box``, ``par_box``

This comment has been minimized.

@cdeil

cdeil Aug 1, 2015

Member

I think you can de-dent this list by four spaces.

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

Please apply this in-line comment.

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

observation_table = observation_table.filter_observations(selection)
else:
if r != None:

This comment has been minimized.

@cdeil

cdeil Aug 1, 2015

Member
if r is not None:

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

(r != None), (system != None)])
if do_sky_circle_selection.all():
print("Applying sky circle selection") # TODO: use logging!!!

This comment has been minimized.

@cdeil

cdeil Aug 1, 2015

Member

As you say, please change all print statements to logging.info or logging.debug or whatever is appropriate statements and add a command line option to set the log level.

I like this pattern .. can you use this?
http://stackoverflow.com/a/28611921/498873

This is relatively quick to implement and makes it easier for me to play with this from the command line.

This comment has been minimized.

@mapazarr

mapazarr Aug 4, 2015

Member

I implemented this, but it's not working properly. The --log option is read correctly, but it won't show debug messages when requested.

This comment has been minimized.

@mapazarr

mapazarr Aug 4, 2015

Member

@cdeil Do you have time to have a look? Otherwise we can merge it and file an issue that it should be fixed.

This comment has been minimized.

@mapazarr

mapazarr Aug 4, 2015

Member

@cdeil Since this is not an urgent problem (the regular output displayed is ok) (and I don't think you'll have time to spend on this tomorrow), I'm merging this now, and I pointed to this problem in the issue #315.

@cdeil

This comment has been minimized.

Member

cdeil commented Aug 1, 2015

Add changelog entry.

Once I've reviewed this for a bit, we should merge it in master and then ask others from HESS to try it out and give feedback in a new issue. This is the first thing most end-user will run, so it's worth the effort to make it as good as possible and add some bells and whistles and lots of docs examples.

@cdeil

This comment has been minimized.

Member

cdeil commented Aug 2, 2015

@mapazarr – I'll cut the Gammapy 0.3 release tomorrow evening (CEST).
What's your preference here?

  • Merge this PR without much further code review and leave further improvements to the future?
  • Or work on this some more and it goes into 0.4?

As long as travis-ci passes so that the CI build isn't broken, I'm fine with merging this.
There's lots of code in Gammapy that will have to be improved in the coming months, it's OK IMO to do this iteratively.
Whatever you prefer.

@mapazarr

This comment has been minimized.

Member

mapazarr commented Aug 3, 2015

@cdeil Sorry, I didn't have much time over the WE.

I'm willing to merge it if I can first work for ~1h to make the findruns work with input fits files (and show a PRELIMINARY warning). Then I'd start working on it on a new branch/PR, in order to implement tests for findruns, the comments from this PR and polish the code a bit more.

So, can I still make a preliminary working version of findruns? or do you have to freeze the stable version already?

@cdeil

This comment has been minimized.

Member

cdeil commented Aug 3, 2015

Can we have a quick Skype chat about this decision?

@mapazarr mapazarr force-pushed the mapazarr:find_observations branch from f04b42d to 78f37af Aug 3, 2015

Fixed problem with header in make_test_observation_table. Defining a …
…filter_observations method in ObservationTable.

@mapazarr mapazarr force-pushed the mapazarr:find_observations branch from 78f37af to b037fda Aug 3, 2015

@mapazarr

This comment has been minimized.

Member

mapazarr commented Aug 3, 2015

@cdeil OK, I added the mods, and the travis-ci passed. Can you merge it?

@cdeil

This comment has been minimized.

Member

cdeil commented Aug 3, 2015

@mapazarr – I'll do one last review ... I promise I won't make suggestions that take long to implement.

CHANGES.rst Outdated
@@ -94,6 +94,7 @@ Pull requests
- Consistent random number handling and improve sample_sphere [#283] (Manuel Paz Arribas)
- Add Feldman Cousins algorithm [#311] (Dirk Lennarz)
- Simplified attribute docstrings [#301] (Manuel Paz Arribas)
- Filter observation tables [#295] (Manuel Paz Arribas)

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

Rename changelog entry and PR title to "Observation table subset selection"

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

@mapazarr mapazarr changed the title from Filter observation tables to Select observations Aug 3, 2015

@mapazarr mapazarr changed the title from Select observations to Observation table subset selection Aug 3, 2015

TODO: describe me
selection : `~dict`
Dictionary with a few keywords for applying selection cuts.
TODO: add doc with allowed selection cuts!!! and link here!!!

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

Is it possible to address this TODO quickly?
(if no, maybe making a new issue for yourself with a task list to keep track of such things would be helpful?)

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

Returns
-------
table : `~gammapy.obs.ObservationTable`
Observation table
Observation table with observatiosn passing the selection.

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

Typo in observatiosn ... I'd suggest the slightly shorter "Observation table after selection"

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

@@ -85,3 +91,244 @@ def select_linspace_subset(self, num):
# Round down to nearest integer
indices = indices.astype('int')
return self[indices]
def select_obs_var_range(self, selection_variable,

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

I find select_obs_var_range is prettly long and cryptic. How about select_minmax or select_range?

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

Parameters
----------
selection_variable : string

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

string -> str

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

Parameters
----------
selection_variable : string
name of variable to apply a cut (it should exist on the table)

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

I always start these one-line descriptions with a capital letter, I think most others in Numpy / Scipy / Astropy do this as well. Please fix all cases you can quickly find (OK to make unrelated code changes in this case ... it's cleanup for the upcoming release).

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Ok. I updated a few functions. I'll keep this in mind for future PRs, and fix the docstrings off the files I work with correspondingly.

obs_table = obs_table[mask]
return obs_table
def select_obs_time_range(self, selection_variable,

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

I think _obs can be removed from the method name.
This is a method on an ObservationTable object, so no need to repeat obs in the method name, no?

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

raise ValueError("Invalid type: {}".format(type))
if type == 'sky_box':
lon_range_eff = (selection['lon'][0] - selection['border'], selection['lon'][1] + selection['border'])

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

Break line after comma.

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

def test_select_parameter_box():
# create random observation table
observatory_name='HESS'

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

The observatory_name is irrelevant for this test, so we should not have to give it.
Does this work? (if no, can you make this work?)

 obs_table = make_test_observation_table(n_obs=10)

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

def test_select_time_box():
# create random observation table with very close (in time)
# observations (and times in absolute times)
observatory_name='HESS'

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

Same here and again below ... obseratory_name should not be specified in this test.

This comment has been minimized.

@mapazarr

mapazarr Aug 3, 2015

Member

Done.

@cdeil

This comment has been minimized.

Member

cdeil commented Aug 3, 2015

I left a bunch of comments that I think can be addressed quickly.

One comment that might be a bit controversial is that I'd prefer you use time_range instead of separate min / max arguments. The main advantage is to reduce the number or arguments, i.e. simplify the API a bit and make the docstrings a bit shorter.

Note that both Sunpy has a TimeRange and astroplan is introducing one (currently names TimeWindow, will be named TimeRange by tomorrow).

@mapazarr – If you don't want to do this change to time_range here, could you make a new issue and put it in a task list there so that we don't forget about this?

(`docs <http://ndawe.github.io/goodruns/>`__, `code <https://github.com/ndawe/goodruns>`__)
is a nice example for a run selection tool.
Examples:

This comment has been minimized.

@cdeil

cdeil Aug 3, 2015

Member

Remove colon :

This comment has been minimized.

@mapazarr

mapazarr Aug 4, 2015

Member

Done.

@mapazarr

This comment has been minimized.

Member

mapazarr commented Aug 4, 2015

Ok, I changed the functions to accept ranges instead of separate min, max values.

Working on select observations. Improved inline command: make it func…
…tional, add tests, improve docs. Plus a bunch of small changes/fixes in related code.

@mapazarr mapazarr referenced this pull request Aug 4, 2015

Closed

Improve observation selection tool #315

0 of 7 tasks complete
@mapazarr

This comment has been minimized.

Member

mapazarr commented Aug 4, 2015

Merging this now, but a few things remain for a future PR: see issue #315.

mapazarr added a commit that referenced this pull request Aug 4, 2015

Merge pull request #295 from mapazarr/find_observations
Observation table subset selection

@mapazarr mapazarr merged commit 874832d into gammapy:master Aug 4, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment