Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GTI union #2277

Merged
merged 8 commits into from Jul 19, 2019
Merged

Implement GTI union #2277

merged 8 commits into from Jul 19, 2019

Conversation

@registerrier
Copy link
Contributor

@registerrier registerrier commented Jul 10, 2019

In order to prepare for light curve analysis with datasets, see PIG 11 #1451 , we need to add timing information on Dataset. To be able to deal with stacking, GTI union is necessary. This PR provides a GTI.union(gti) method that performs union of the time intervals in the two GTI.

@registerrier registerrier added this to the 0.13 milestone Jul 10, 2019
@registerrier registerrier added this to To do in gammapy.time via automation Jul 10, 2019
@registerrier registerrier requested review from cdeil and adonath Jul 10, 2019
@cdeil
Copy link
Member

@cdeil cdeil commented Jul 10, 2019

Do we really need this kind of union?

I think all we need is vstack, and we would try to put a convenience method based on the underlying astropy.table.vstack, like they did here?
https://docs.astropy.org/en/stable/timeseries/analysis.html#combining-time-series

Either way, this is tricky to implement and to get right, but using Astropy Table and Time as much as possible will help.

If some kind of GTI interval merging is really needed, I would suggest to implement this as a separate method, so that this combined vstack and merge would be two chained method calls, but each can be called separately. (but I don't see the use case yet, i.e. where overlapping time intervals would occur in Gammapy)

@registerrier - Thoughts?

@registerrier
Copy link
Contributor Author

@registerrier registerrier commented Jul 10, 2019

There are two different thing here.
One is whether astropy.timeseries might be better to store the GTI than a simple Table. This is likely.
The other is whether GTI combination can be done by simply vstack tables, and I think not, hence the proposed code.
We have to add GTI to Dataset, the latter can be stacked so we need a way to properly deal with it. The case of overlapping time intervals will happen e.g. stacking different event types etc.

@cdeil
Copy link
Member

@cdeil cdeil commented Jul 10, 2019

The case of overlapping time intervals will happen e.g. stacking different event types etc.

Not sure if this will be needed, naively I would think no: we shouldn't combine time-related selection or stacking with event type selections or stacking or it'll be too complicated.

Suggest to only implement a vstack helper method for now that's using Table and Time methods, and to only talk about and add a separate method for merging intervals when needed.

Note that the stack itself is already complicated, you have to deal with the possibly different ref times. It has to be done in a generic way in a helper function, which is then called from GTI, Events and LightCurve methods if those provide conveniences for that. I thought we had this already implemented somewhere (events?) but I'm not sure. We can also re-dicuss whether we change our design and use Time columns in Table, especially if requiring very modern Astropy (see #2218), but that's for later.
If you need this now, suggest to put a simpler helper function that doesn't merge intervals.

@adonath
Copy link
Member

@adonath adonath commented Jul 10, 2019

I mostly agree with @cdeil, that I think it's a feature that we don't need urgently and the simple vstack on gti.table will do for the use-cases we want to support for v1.0. Implementation-wise I would also prefer a two step procedure, where you first stack the tables and take care of units and reference times and then merge the overlapping intervals if needed after.

@registerrier
Copy link
Contributor Author

@registerrier registerrier commented Jul 11, 2019

No, there shouldn't be any vstack method, if GTIs contains a method like time_sum. This is a perfect recipe for serious invisible errors.

@adonath
Copy link
Member

@adonath adonath commented Jul 11, 2019

@registerrier Just to clarify: my comment was on the actual implementation, not the functionality as such. So instead of the Python loop on the individual intervals, stack the table and then merge the intervals, maybe ideally with some kind of "labelled mask" or group table. What is not clear to me yet, is whether the functionality is meant to be only used for stacking of two datasets, or whether you want to identify on a basis of multiple GTI tables which datasets to stack. In the latter case you need some kind of "labelled" mask or a back-reference in the GTI table to the dataset.

@cdeil cdeil mentioned this pull request Jul 15, 2019
@cdeil
Copy link
Member

@cdeil cdeil commented Jul 18, 2019

Could you please put this and test it separately?

In [1]: def stack(self, other): 
   ...:     start = (other.time_start - self.time_ref).sec 
   ...:     end = (other.time_stop - self.time_ref).sec 
   ...:     table = Table(rows={"START": start, "END": end}) 
   ...:     return astropy.table.vstack([self.table, table]) 

You'll have to add this:

def time_ref(self):

And then the union can be simpler, a mark and sweep left to right, after sorting by START, accumulating in a list and making a new table at the end.

Also suggest a copy as convenience.

@cdeil cdeil self-assigned this Jul 18, 2019
@cdeil cdeil removed the request for review from adonath Jul 18, 2019
@cdeil cdeil removed their request for review Jul 19, 2019
@cdeil cdeil assigned registerrier and unassigned cdeil Jul 19, 2019
@cdeil
Copy link
Member

@cdeil cdeil commented Jul 19, 2019

@registerrier - I've simplified the GTI.union method in c590281 by removing the stack from within union, i.e. separating those two operations, and by using Row objects directly without introducing a nested TimeInterval class.

Having stack and union separate allows for simpler testing. In
c99fb81 I just set up one simple test case for union, and one for stack. I think those are the minimal tests we can write that covers what the methods do?

I've made one small change and noted it in the docstring that intervals that touch get merged.
Since the use cases for this method aren't clearly identified yet, it's not clear to me which behaviour we want, but I think you want to use it for stacking, and there having fewer intervals in the output is maybe preferred a tiny bit, that's why I put <= for now.

@registerrier - Please review.

@cdeil
Copy link
Member

@cdeil cdeil commented Jul 19, 2019

There's a fail on Python 3.5: https://dev.azure.com/gammapy/gammapy/_build/results?buildId=1368&view=logs&jobId=54b4a527-ead6-5d78-3f02-5346815e0ca2&taskId=69766c11-54dd-5fe4-8e37-8d474da710d9&lineStart=149&lineEnd=167&colStart=1&colEnd=1

It looks like START and STOP column values are interchanged there. I don't understand why.
Guessing it has to do with something about Row that I don't understand, here's an attempt to not use it in the union algorithm: 88d9801

It's still always the same as what you wrote @registerrier - the only difference really of what I did is that stack and union are separate.

@cdeil
Copy link
Member

@cdeil cdeil commented Jul 19, 2019

@cdeil cdeil merged commit c4676db into gammapy:master Jul 19, 2019
7 of 9 checks passed
gammapy.time automation moved this from To do to Done Jul 19, 2019
@adonath adonath changed the title Add GTI union Implement GTI union Jul 25, 2019
@registerrier registerrier deleted the gti_union branch Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
gammapy.time
  
Done
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants