Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observation table subset selection #295

Merged
merged 19 commits into from Aug 4, 2015

Conversation

mapazarr
Copy link
Member

@mapazarr mapazarr commented Jul 1, 2015

Added methods for filtering observation tables to the ObservationTable class. The method allows for box or circle selections on any variable inside the observation table. In addition, zenith is interpreted as (90 deg - altitude). Inverse selection is also possible. The doc is uploaded here:
https://googledrive.com/host/0BzXHZQx0oCLBfmcyUDdxMU9sLVduMXM1QXk5RTNYU3dXYmthVG9OZ18wWE1uR3lYaEsxRU0/html_doc/docs/_build/html
This PR addresses issue #293.
TODO:

  • revise/implement methods for selecting regions in the sky.
  • implement inline 'findruns` command
  • use logging instead of print statements in 'findruns`
  • implement tests for 'findruns`

@cdeil cdeil added the feature label Jul 1, 2015
@cdeil cdeil added this to the 0.3 milestone Jul 1, 2015
@cdeil cdeil self-assigned this Jul 1, 2015
@mapazarr
Copy link
Member Author

mapazarr commented Jul 1, 2015

I have seen the travis-ci error because of warnings in the sphinx-doc. I'll look into it tomorrow.

@mapazarr
Copy link
Member Author

mapazarr commented Jul 1, 2015

Ok. I fixed the warnings. The travis-ci build looks fine.


def test_filter_observations():
# create random observation table
observatory_name='HESS'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put observatory_name='HESS' and n_obs=10 as default keyword arguments for make_test_observation_table.
So that if the user doesn't look up and give any parameters, they still get a nice small table to play with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@cdeil
Copy link
Contributor

cdeil commented Jul 2, 2015

The main additions in this PR are the ObservationTable.filter_observations_var_range and ObservationTable.filter_observations methods.
I have to say I don't like the current methods ... in this case it would have been better for us to discuss the API in detail before you start implementing something, now you'll mostly have to rewrite this. Sorry.

We can discuss in the hangout tonight, but here's my first thoughts / suggestions:

  • Stop working on this now ... finish Cube bg model container class: CubeBackgroundModel #292 first. You've worked on it for two weeks, but it still needs significant work and I don't think you'll be more productive by working on two things simultaneously.
  • Include the implementation of the find_obs command line tool in this PR! Having a very concrete use case for the observation table selection methods you implement here will help us find a good API. The devil is in the detail ... how to go from command line arguments to Astropy time and angle and ... objects and how to feed those into the selection methods.
  • What I don't like about the current methods is that they are so complicated, handing various special cases with if statements. Having said that ... I don't have a 100% complete concrete API proposal ye ... we need to think and discuss a bit more.
  • For the sky box and cone selection, I think splitting this in separate methods like I did in the EventList class would be best ... that should probably be done via a mixin class or by calling an external function, to avoid re-implementing box and cone selection.

What's most unclear to me is how to best implement the min / max selection that's currently in filter_observations_var_range:

  • One option I think is not needed is the invert to select outside the (min, max) range ... that can simply be removed, or do you see a use case?
  • There's the generic way of just using numpy boolean indexing like t[(34 < t['ALT']) & (t['ALT'] < 50)] ... so in a sense we don't even need to implement it. For time and zenith it doesn't work, so for those you could add a special method like I did for EventList.select_energy? I think that would be an OK solution ... have the few common parameters people make selections on or that can't be computed trivially like time special-cased, and in the docs show an example that other cuts can be done with a mask computed with a one-line numpy expression.

Some other possibilities we could discuss a bit, but I think are probably too complex to hash out and implement in a short amount of time (this should take a few days at most, not weeks):

  • Implement something like the pandas DataFrame.query method, or just wrap it by calling to_pandas and from_pandas on the astropy.table.Table. I'm not sure how useful it is, because our columns have units which would be ignored (unless you want to implement your own parser and add it as an astropy.table.Table.query method).
  • In astroplan we're representing constraints as objects, like TimeWindow (see here). So here, we could have SkyConeSelection, SkyBoxSelection, TimeWindow, ... and then a user could pass lists of such selections to Observation_table.select. That would split out the computations you have now into manageable, re-usable pieces, and we could implement representations of such selections as dicts and lists, which we could then write / read in JSON format, i.e. we have a serialisation, which opens up the possibility to store the selections in a log file, or if someone implements a web app in the future where the user can select data and events, the Javascript frontend can send the selection in that format to the server, which applies it and sends back the list.

So bottom line ... I think those last two fancy selection methods would be nice, but we don't have the time and they are probably too complex for the task, and should implement the simple solution I described before.
@mapazarr – Let me know what you think and let's discuss in the hangout tonight.

@mapazarr
Copy link
Member Author

mapazarr commented Jul 2, 2015

Ok I see your point with the complexity of the method. I still think that having a generic way of making cuts on any variable is nice, without the need of implementing a new method each time. In addition, if something changes in the way of treating variables, one has to update only one function, not many (with the risk of forgetting updating some).
In addition, I thought this was something you wanted to have, according to the following TODO in the code:
https://github.com/gammapy/gammapy/pull/295/files#diff-77e060364d7c4d7526826a9804fdf2b0L229

I think it still would be possible to use the code, making it a bit simpler, by splitting some cases. The complexity of the code is introduced by:

  1. The time variables, that are not well casted into quantity objects.
  2. The special case for the zenith angle.
  3. The 2D selections for sky regions. For this I created the sky_box or sky_circle keywords, in order to treat them separately.

To fix 1. one could do the same as for 3.: use a separate keyword time_box or time_circle and make an independent method.

To fix 2. we can either drop the case of the zenith angle (and the user should use altitude) or transform it at an earlier stage.

Another complexity in the case of time variables is the need to support 2 different formats:

  • Absolute time strings.
  • Relative times w.r.t. the time reference in the table header (MET).

You already mentioned a couple of weeks ago that we should support both. If using a separate case for time variables, this complexity would become clearer, since it would not be nested with the zenith angle special case.

And yet another complexity in the case of time variables: when making a box selection in time, you need to check 2 variables, not only 1:

  • TIME_START
  • TIME_STOP

The observation (start and end) should be within the selected time interval. One could drop this use case, and the user will have to decide if he wants to apply the cut to either TIME_START or TIME_STOP, or apply the selection twice. But I think it would be nice to have the flexibility of doing also the cut on both variables with only one call to the method. Again, using a separate case for time variables would help clarify this.

So I propose to split the code a bit more, but keeping the generality for allowing cuts in any variable. The splitting would be according to the variable type, rather than the variable name.

To finalise, the code works properly, as demonstrated by the many test cases implemented in the test file. One can show them as examples for the user.

@mapazarr
Copy link
Member Author

mapazarr commented Jul 2, 2015

BTW, for the inline command
https://github.com/gammapy/gammapy/blob/master/gammapy/scripts/find_obs.py
one can make it work with the functions I wrote. Since it does nothing right now it shouldn't be hard to make it work. I can do it within this PR next week if you'd like to.

@mapazarr
Copy link
Member Author

mapazarr commented Jul 2, 2015

For the invert method I see a use case that I'll be using: For building the bg models I'd like to exclude runs within the Galactic plane, or too close to a source. It's pretty simple to do so by taking a list of all observations and doing inverted selections on a box enclosing the galactic plane and on circles enclosing the sources of a catalogue (for instance TeVCAT).

@mapazarr mapazarr force-pushed the find_observations branch 2 times, most recently from 5312132 to c069d22 Compare August 1, 2015 08:08
@mapazarr
Copy link
Member Author

mapazarr commented Aug 1, 2015

Ok, I did a major revision of the whole filtering process. I tried to simplify it as much as I could, and document it well. Now there are 4 different kinds of selections:

  • sky circle and sky box, as regions in the sky.
  • time intervals
  • intervals on generic variables of the obs table that can be casted into a Quantity

I also started to implement the inline find-obs tool. It is not 100% functional but it works. You can test it with the following commands:

gammapy-find-obs -h
gammapy-find-obs dummy.fits
gammapy-find-obs dummy.fits out.dummy.fits –overwrite
gammapy-find-obs dummy.fits –x 130 –y -40 –r 50 –system ‘icrs’
gammapy-find-obs dummy.fits –x 0 –y 0 –r 50 –system ‘galactic’
gammapy-find-obs dummy.fits –x 225 –y -25 –dx 75 –dy 25 –system ‘icrs’
gammapy-find-obs dummy.fits –x -25 –y 0 –dx 75 –dy 25 –system ‘galactic’
gammapy-find-obs dummy.fits –t_start ‘2012-01-01T00:00:00’ –t_stop ‘2014-01-01T00:00:00’
gammapy-find-obs dummy.fits –par_name ‘OBS_ID’ –par_min 2 –par_max 6
gammapy-find-obs dummy.fits –par_name ‘ALT’ –par_min 60 –par_max 70

Please have a look at the docs, especially at:

@cdeil I still need to work on this, but please take some time to review the general structure of the code, to see if you like it, and best plan how to move on.

Please note, that I find the generic variable filtering method very useful for the construction of the observation groups for the binning of the observations for producing the cube bg models.

@cdeil
Copy link
Contributor

cdeil commented Aug 1, 2015

Wow, you've been busy. 😄

It'll take me a few hours to review this, play with this and think if this is the best API / implementation or if it can be improved somehow. I guess you'll continue with this on Monday and I can take my time?

If you do want to continue with this, my suggestion would be that you work on the gammapy-find-obs tool to make it easy for me to start using it, i.e.:

  • How can I quickly create the test obs table file you're using for all the unit tests? Maybe give a one-line python -c 'from gammapy.datasets import ... command or even make this an option gammapy-find-obs --dump-test-table.
  • Can you make it possible to read / write ECSV files? This will make it easier for me to quickly read the input / output files. And many users will prefer a text file format for observation lists I think.
  • Start implementing the examples you give in the docs and the GH comment above as tests. Probably it's enough to have one or max two observation tables that are used for all the tests and the same one for the docs?
  • Move some of the larger TODOs from the code into the task list at the top of this GH issue. This will make it easier for you / me to gauge if this takes a few more days or more than a week and whether we want to split some things off to future PRs.

One concern I have is that we want very similar selection methods for event lists and for the sky regions, for source catalogs, so the code for that should be shared somehow. Not sure what the solution for this is yet.


It works with `fits` files as input/output.

Examples:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a sub-section, so that it gets a URL one can link to from emails or other parts of the Sphinx docs.

Examples
-------------

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please implement this suggestion (make Examples a RST sub-section so that it can be linked to)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@cdeil
Copy link
Contributor

cdeil commented Aug 1, 2015

Add changelog entry.

Once I've reviewed this for a bit, we should merge it in master and then ask others from HESS to try it out and give feedback in a new issue. This is the first thing most end-user will run, so it's worth the effort to make it as good as possible and add some bells and whistles and lots of docs examples.

@cdeil
Copy link
Contributor

cdeil commented Aug 2, 2015

@mapazarr – I'll cut the Gammapy 0.3 release tomorrow evening (CEST).
What's your preference here?

  • Merge this PR without much further code review and leave further improvements to the future?
  • Or work on this some more and it goes into 0.4?

As long as travis-ci passes so that the CI build isn't broken, I'm fine with merging this.
There's lots of code in Gammapy that will have to be improved in the coming months, it's OK IMO to do this iteratively.
Whatever you prefer.

@mapazarr
Copy link
Member Author

mapazarr commented Aug 3, 2015

@cdeil Sorry, I didn't have much time over the WE.

I'm willing to merge it if I can first work for ~1h to make the findruns work with input fits files (and show a PRELIMINARY warning). Then I'd start working on it on a new branch/PR, in order to implement tests for findruns, the comments from this PR and polish the code a bit more.

So, can I still make a preliminary working version of findruns? or do you have to freeze the stable version already?

@cdeil
Copy link
Contributor

cdeil commented Aug 3, 2015

Can we have a quick Skype chat about this decision?

…filter_observations method in ObservationTable.
@mapazarr
Copy link
Member Author

mapazarr commented Aug 3, 2015

@cdeil OK, I added the mods, and the travis-ci passed. Can you merge it?

@cdeil
Copy link
Contributor

cdeil commented Aug 3, 2015

@mapazarr – I'll do one last review ... I promise I won't make suggestions that take long to implement.

@@ -94,6 +94,7 @@ Pull requests
- Consistent random number handling and improve sample_sphere [#283] (Manuel Paz Arribas)
- Add Feldman Cousins algorithm [#311] (Dirk Lennarz)
- Simplified attribute docstrings [#301] (Manuel Paz Arribas)
- Filter observation tables [#295] (Manuel Paz Arribas)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename changelog entry and PR title to "Observation table subset selection"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mapazarr mapazarr changed the title Filter observation tables Select observations Aug 3, 2015
@mapazarr mapazarr changed the title Select observations Observation table subset selection Aug 3, 2015
TODO: describe me
selection : `~dict`
Dictionary with a few keywords for applying selection cuts.
TODO: add doc with allowed selection cuts!!! and link here!!!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to address this TODO quickly?
(if no, maybe making a new issue for yourself with a task list to keep track of such things would be helpful?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@cdeil
Copy link
Contributor

cdeil commented Aug 3, 2015

I left a bunch of comments that I think can be addressed quickly.

One comment that might be a bit controversial is that I'd prefer you use time_range instead of separate min / max arguments. The main advantage is to reduce the number or arguments, i.e. simplify the API a bit and make the docstrings a bit shorter.

Note that both Sunpy has a TimeRange and astroplan is introducing one (currently names TimeWindow, will be named TimeRange by tomorrow).

@mapazarr – If you don't want to do this change to time_range here, could you make a new issue and put it in a task list there so that we don't forget about this?

(`docs <http://ndawe.github.io/goodruns/>`__, `code <https://github.com/ndawe/goodruns>`__)
is a nice example for a run selection tool.

Examples:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove colon :

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mapazarr
Copy link
Member Author

mapazarr commented Aug 4, 2015

Ok, I changed the functions to accept ranges instead of separate min, max values.

…tional, add tests, improve docs. Plus a bunch of small changes/fixes in related code.
@mapazarr mapazarr mentioned this pull request Aug 4, 2015
7 tasks
@mapazarr
Copy link
Member Author

mapazarr commented Aug 4, 2015

Merging this now, but a few things remain for a future PR: see issue #315.

mapazarr added a commit that referenced this pull request Aug 4, 2015
Observation table subset selection
@mapazarr mapazarr merged commit 874832d into gammapy:master Aug 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants