Observation table subset selection #295

mapazarr · 2015-07-01T07:00:42Z

Added methods for filtering observation tables to the ObservationTable class. The method allows for box or circle selections on any variable inside the observation table. In addition, zenith is interpreted as (90 deg - altitude). Inverse selection is also possible. The doc is uploaded here:
https://googledrive.com/host/0BzXHZQx0oCLBfmcyUDdxMU9sLVduMXM1QXk5RTNYU3dXYmthVG9OZ18wWE1uR3lYaEsxRU0/html_doc/docs/_build/html
This PR addresses issue #293.
TODO:

revise/implement methods for selecting regions in the sky.
implement inline 'findruns` command
use logging instead of print statements in 'findruns`
implement tests for 'findruns`

mapazarr · 2015-07-01T07:58:59Z

I have seen the travis-ci error because of warnings in the sphinx-doc. I'll look into it tomorrow.

mapazarr · 2015-07-01T19:58:49Z

Ok. I fixed the warnings. The travis-ci build looks fine.

cdeil · 2015-07-02T16:18:14Z

gammapy/obs/tests/test_observation.py

+
+def test_filter_observations():
+    # create random observation table
+    observatory_name='HESS'


Please put observatory_name='HESS' and n_obs=10 as default keyword arguments for make_test_observation_table.
So that if the user doesn't look up and give any parameters, they still get a nice small table to play with.

cdeil · 2015-07-02T17:43:39Z

The main additions in this PR are the ObservationTable.filter_observations_var_range and ObservationTable.filter_observations methods.
I have to say I don't like the current methods ... in this case it would have been better for us to discuss the API in detail before you start implementing something, now you'll mostly have to rewrite this. Sorry.

We can discuss in the hangout tonight, but here's my first thoughts / suggestions:

Stop working on this now ... finish Cube bg model container class: CubeBackgroundModel #292 first. You've worked on it for two weeks, but it still needs significant work and I don't think you'll be more productive by working on two things simultaneously.
Include the implementation of the find_obs command line tool in this PR! Having a very concrete use case for the observation table selection methods you implement here will help us find a good API. The devil is in the detail ... how to go from command line arguments to Astropy time and angle and ... objects and how to feed those into the selection methods.
What I don't like about the current methods is that they are so complicated, handing various special cases with if statements. Having said that ... I don't have a 100% complete concrete API proposal ye ... we need to think and discuss a bit more.
For the sky box and cone selection, I think splitting this in separate methods like I did in the EventList class would be best ... that should probably be done via a mixin class or by calling an external function, to avoid re-implementing box and cone selection.

What's most unclear to me is how to best implement the min / max selection that's currently in filter_observations_var_range:

One option I think is not needed is the invert to select outside the (min, max) range ... that can simply be removed, or do you see a use case?
There's the generic way of just using numpy boolean indexing like t[(34 < t['ALT']) & (t['ALT'] < 50)] ... so in a sense we don't even need to implement it. For time and zenith it doesn't work, so for those you could add a special method like I did for EventList.select_energy? I think that would be an OK solution ... have the few common parameters people make selections on or that can't be computed trivially like time special-cased, and in the docs show an example that other cuts can be done with a mask computed with a one-line numpy expression.

Some other possibilities we could discuss a bit, but I think are probably too complex to hash out and implement in a short amount of time (this should take a few days at most, not weeks):

Implement something like the pandas DataFrame.query method, or just wrap it by calling to_pandas and from_pandas on the astropy.table.Table. I'm not sure how useful it is, because our columns have units which would be ignored (unless you want to implement your own parser and add it as an astropy.table.Table.query method).
In astroplan we're representing constraints as objects, like TimeWindow (see here). So here, we could have SkyConeSelection, SkyBoxSelection, TimeWindow, ... and then a user could pass lists of such selections to Observation_table.select. That would split out the computations you have now into manageable, re-usable pieces, and we could implement representations of such selections as dicts and lists, which we could then write / read in JSON format, i.e. we have a serialisation, which opens up the possibility to store the selections in a log file, or if someone implements a web app in the future where the user can select data and events, the Javascript frontend can send the selection in that format to the server, which applies it and sends back the list.

So bottom line ... I think those last two fancy selection methods would be nice, but we don't have the time and they are probably too complex for the task, and should implement the simple solution I described before.
@mapazarr – Let me know what you think and let's discuss in the hangout tonight.

mapazarr · 2015-07-02T19:18:51Z

Ok I see your point with the complexity of the method. I still think that having a generic way of making cuts on any variable is nice, without the need of implementing a new method each time. In addition, if something changes in the way of treating variables, one has to update only one function, not many (with the risk of forgetting updating some).
In addition, I thought this was something you wanted to have, according to the following TODO in the code:
https://github.com/gammapy/gammapy/pull/295/files#diff-77e060364d7c4d7526826a9804fdf2b0L229

I think it still would be possible to use the code, making it a bit simpler, by splitting some cases. The complexity of the code is introduced by:

The time variables, that are not well casted into quantity objects.
The special case for the zenith angle.
The 2D selections for sky regions. For this I created the sky_box or sky_circle keywords, in order to treat them separately.

To fix 1. one could do the same as for 3.: use a separate keyword time_box or time_circle and make an independent method.

To fix 2. we can either drop the case of the zenith angle (and the user should use altitude) or transform it at an earlier stage.

Another complexity in the case of time variables is the need to support 2 different formats:

Absolute time strings.
Relative times w.r.t. the time reference in the table header (MET).

You already mentioned a couple of weeks ago that we should support both. If using a separate case for time variables, this complexity would become clearer, since it would not be nested with the zenith angle special case.

And yet another complexity in the case of time variables: when making a box selection in time, you need to check 2 variables, not only 1:

TIME_START
TIME_STOP

The observation (start and end) should be within the selected time interval. One could drop this use case, and the user will have to decide if he wants to apply the cut to either TIME_START or TIME_STOP, or apply the selection twice. But I think it would be nice to have the flexibility of doing also the cut on both variables with only one call to the method. Again, using a separate case for time variables would help clarify this.

So I propose to split the code a bit more, but keeping the generality for allowing cuts in any variable. The splitting would be according to the variable type, rather than the variable name.

To finalise, the code works properly, as demonstrated by the many test cases implemented in the test file. One can show them as examples for the user.

mapazarr · 2015-07-02T19:28:58Z

BTW, for the inline command
https://github.com/gammapy/gammapy/blob/master/gammapy/scripts/find_obs.py
one can make it work with the functions I wrote. Since it does nothing right now it shouldn't be hard to make it work. I can do it within this PR next week if you'd like to.

mapazarr · 2015-07-02T19:37:13Z

For the invert method I see a use case that I'll be using: For building the bg models I'd like to exclude runs within the Galactic plane, or too close to a source. It's pretty simple to do so by taking a list of all observations and doing inverted selections on a box enclosing the galactic plane and on circles enclosing the sources of a catalogue (for instance TeVCAT).

mapazarr · 2015-08-01T08:40:08Z

Ok, I did a major revision of the whole filtering process. I tried to simplify it as much as I could, and document it well. Now there are 4 different kinds of selections:

sky circle and sky box, as regions in the sky.
time intervals
intervals on generic variables of the obs table that can be casted into a Quantity

I also started to implement the inline find-obs tool. It is not 100% functional but it works. You can test it with the following commands:

gammapy-find-obs -h
gammapy-find-obs dummy.fits
gammapy-find-obs dummy.fits out.dummy.fits –overwrite
gammapy-find-obs dummy.fits –x 130 –y -40 –r 50 –system ‘icrs’
gammapy-find-obs dummy.fits –x 0 –y 0 –r 50 –system ‘galactic’
gammapy-find-obs dummy.fits –x 225 –y -25 –dx 75 –dy 25 –system ‘icrs’
gammapy-find-obs dummy.fits –x -25 –y 0 –dx 75 –dy 25 –system ‘galactic’
gammapy-find-obs dummy.fits –t_start ‘2012-01-01T00:00:00’ –t_stop ‘2014-01-01T00:00:00’
gammapy-find-obs dummy.fits –par_name ‘OBS_ID’ –par_min 2 –par_max 6
gammapy-find-obs dummy.fits –par_name ‘ALT’ –par_min 60 –par_max 70

Please have a look at the docs, especially at:

@cdeil I still need to work on this, but please take some time to review the general structure of the code, to see if you like it, and best plan how to move on.

Please note, that I find the generic variable filtering method very useful for the construction of the observation groups for the binning of the observations for producing the cube bg models.

cdeil · 2015-08-01T09:29:16Z

Wow, you've been busy. 😄

It'll take me a few hours to review this, play with this and think if this is the best API / implementation or if it can be improved somehow. I guess you'll continue with this on Monday and I can take my time?

If you do want to continue with this, my suggestion would be that you work on the gammapy-find-obs tool to make it easy for me to start using it, i.e.:

How can I quickly create the test obs table file you're using for all the unit tests? Maybe give a one-line python -c 'from gammapy.datasets import ... command or even make this an option gammapy-find-obs --dump-test-table.
Can you make it possible to read / write ECSV files? This will make it easier for me to quickly read the input / output files. And many users will prefer a text file format for observation lists I think.
Start implementing the examples you give in the docs and the GH comment above as tests. Probably it's enough to have one or max two observation tables that are used for all the tests and the same one for the docs?
Move some of the larger TODOs from the code into the task list at the top of this GH issue. This will make it easier for you / me to gauge if this takes a few more days or more than a week and whether we want to split some things off to future PRs.

One concern I have is that we want very similar selection methods for event lists and for the sky regions, for source catalogs, so the code for that should be shared somehow. Not sure what the solution for this is yet.

cdeil · 2015-08-01T09:32:56Z

docs/obs/find_observations.rst

+
+It works with `fits` files as input/output.
+
+Examples:


Make this a sub-section, so that it gets a URL one can link to from emails or other parts of the Sphinx docs.

Examples -------------

Please implement this suggestion (make Examples a RST sub-section so that it can be linked to)

cdeil · 2015-08-01T10:04:21Z

Add changelog entry.

Once I've reviewed this for a bit, we should merge it in master and then ask others from HESS to try it out and give feedback in a new issue. This is the first thing most end-user will run, so it's worth the effort to make it as good as possible and add some bells and whistles and lots of docs examples.

cdeil · 2015-08-02T22:17:14Z

@mapazarr – I'll cut the Gammapy 0.3 release tomorrow evening (CEST).
What's your preference here?

Merge this PR without much further code review and leave further improvements to the future?
Or work on this some more and it goes into 0.4?

As long as travis-ci passes so that the CI build isn't broken, I'm fine with merging this.
There's lots of code in Gammapy that will have to be improved in the coming months, it's OK IMO to do this iteratively.
Whatever you prefer.

mapazarr · 2015-08-03T18:22:04Z

@cdeil Sorry, I didn't have much time over the WE.

I'm willing to merge it if I can first work for ~1h to make the findruns work with input fits files (and show a PRELIMINARY warning). Then I'd start working on it on a new branch/PR, in order to implement tests for findruns, the comments from this PR and polish the code a bit more.

So, can I still make a preliminary working version of findruns? or do you have to freeze the stable version already?

cdeil · 2015-08-03T18:39:33Z

Can we have a quick Skype chat about this decision?

…filter_observations method in ObservationTable.

…-> select.

mapazarr · 2015-08-03T19:53:12Z

@cdeil OK, I added the mods, and the travis-ci passed. Can you merge it?

cdeil · 2015-08-03T19:55:12Z

@mapazarr – I'll do one last review ... I promise I won't make suggestions that take long to implement.

cdeil · 2015-08-03T19:55:36Z

CHANGES.rst

@@ -94,6 +94,7 @@ Pull requests
 - Consistent random number handling and improve sample_sphere [#283] (Manuel Paz Arribas)
 - Add Feldman Cousins algorithm [#311] (Dirk Lennarz)
 - Simplified attribute docstrings [#301] (Manuel Paz Arribas)
+- Filter observation tables [#295] (Manuel Paz Arribas)


Rename changelog entry and PR title to "Observation table subset selection"

cdeil · 2015-08-03T20:02:28Z

gammapy/obs/datastore.py

-            TODO: describe me
+        selection : `~dict`
+            Dictionary with a few keywords for applying selection cuts.
+            TODO: add doc with allowed selection cuts!!! and link here!!!


Is it possible to address this TODO quickly?
(if no, maybe making a new issue for yourself with a task list to keep track of such things would be helpful?)

cdeil · 2015-08-03T20:41:34Z

I left a bunch of comments that I think can be addressed quickly.

One comment that might be a bit controversial is that I'd prefer you use time_range instead of separate min / max arguments. The main advantage is to reduce the number or arguments, i.e. simplify the API a bit and make the docstrings a bit shorter.

Note that both Sunpy has a TimeRange and astroplan is introducing one (currently names TimeWindow, will be named TimeRange by tomorrow).

@mapazarr – If you don't want to do this change to time_range here, could you make a new issue and put it in a task list there so that we don't forget about this?

cdeil · 2015-08-03T22:11:34Z

docs/obs/find_observations.rst

+      (`docs <http://ndawe.github.io/goodruns/>`__, `code <https://github.com/ndawe/goodruns>`__)
+      is a nice example for a run selection tool.
+
+Examples:


Remove colon :

mapazarr · 2015-08-04T05:10:07Z

Ok, I changed the functions to accept ranges instead of separate min, max values.

…tional, add tests, improve docs. Plus a bunch of small changes/fixes in related code.

mapazarr · 2015-08-04T07:34:25Z

Merging this now, but a few things remain for a future PR: see issue #315.

Observation table subset selection

mapazarr mentioned this pull request Jul 1, 2015

AttributeError from ObservationTable mask subset selection #293

Closed

cdeil added the feature label Jul 1, 2015

cdeil added this to the 0.3 milestone Jul 1, 2015

cdeil self-assigned this Jul 1, 2015

cdeil reviewed Jul 2, 2015
View reviewed changes

mapazarr force-pushed the find_observations branch 2 times, most recently from 5312132 to c069d22 Compare August 1, 2015 08:08

cdeil reviewed Aug 1, 2015
View reviewed changes

mapazarr force-pushed the find_observations branch from f04b42d to 78f37af Compare August 3, 2015 19:27

Fixed problem with header in make_test_observation_table. Defining a …

ff80935

…filter_observations method in ObservationTable.

mapazarr added 2 commits August 3, 2015 12:36

Added entry to CHANGE log.

697b4f9

Producing a PRELIMINARY working version of find-obs. Renaming filter …

b037fda

…-> select.

mapazarr force-pushed the find_observations branch from 78f37af to b037fda Compare August 3, 2015 19:36

cdeil reviewed Aug 3, 2015
View reviewed changes

mapazarr changed the title ~~Filter observation tables~~ Select observations Aug 3, 2015

Updated PR name in CHANGES log.

1611117

mapazarr changed the title ~~Select observations~~ Observation table subset selection Aug 3, 2015

Updated PR name in CHANGES log once more.

5755f3e

cdeil reviewed Aug 3, 2015
View reviewed changes

Small changes and fixes for code related to observation selection.

fa9cf95

cdeil reviewed Aug 3, 2015
View reviewed changes

Using logging in find_obs.

9e1072e

Working on select observations. Improved inline command: make it func…

1827d16

…tional, add tests, improve docs. Plus a bunch of small changes/fixes in related code.

mapazarr mentioned this pull request Aug 4, 2015

Improve observation selection tool #315

Closed

7 tasks

mapazarr added a commit that referenced this pull request Aug 4, 2015

Merge pull request #295 from mapazarr/find_observations

874832d

Observation table subset selection

mapazarr merged commit 874832d into gammapy:master Aug 4, 2015

mapazarr mentioned this pull request Aug 4, 2015

Docs of ObservationTable misses methods. #316

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observation table subset selection #295

Observation table subset selection #295

mapazarr commented Jul 1, 2015

mapazarr commented Jul 1, 2015

mapazarr commented Jul 1, 2015

cdeil Jul 2, 2015

mapazarr Aug 3, 2015

cdeil commented Jul 2, 2015

mapazarr commented Jul 2, 2015

mapazarr commented Jul 2, 2015

mapazarr commented Jul 2, 2015

mapazarr commented Aug 1, 2015

cdeil commented Aug 1, 2015

cdeil Aug 1, 2015

cdeil Aug 3, 2015

mapazarr Aug 3, 2015

cdeil commented Aug 1, 2015

cdeil commented Aug 2, 2015

mapazarr commented Aug 3, 2015

cdeil commented Aug 3, 2015

mapazarr commented Aug 3, 2015

cdeil commented Aug 3, 2015

cdeil Aug 3, 2015

mapazarr Aug 3, 2015

cdeil Aug 3, 2015

mapazarr Aug 3, 2015

cdeil commented Aug 3, 2015

cdeil Aug 3, 2015

mapazarr Aug 4, 2015

mapazarr commented Aug 4, 2015

mapazarr commented Aug 4, 2015

Observation table subset selection #295

Observation table subset selection #295

Conversation

mapazarr commented Jul 1, 2015

mapazarr commented Jul 1, 2015

mapazarr commented Jul 1, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdeil commented Jul 2, 2015

mapazarr commented Jul 2, 2015

mapazarr commented Jul 2, 2015

mapazarr commented Jul 2, 2015

mapazarr commented Aug 1, 2015

cdeil commented Aug 1, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdeil commented Aug 1, 2015

cdeil commented Aug 2, 2015

mapazarr commented Aug 3, 2015

cdeil commented Aug 3, 2015

mapazarr commented Aug 3, 2015

cdeil commented Aug 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdeil commented Aug 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mapazarr commented Aug 4, 2015

mapazarr commented Aug 4, 2015