# IACT DL3 data handling in Gammapy

The current IACT DL3 data handling and all analyses in Gammapy have some issues and limitations.

- No support for 1D or 3D analysis in time intervals, only full runs
- No support for event types
- Poor support for pulsar PHASE selection (only for 1D, in background estimator)
- No support for user-defined events selections (e.g. MC_ID)
- Creating data selection (e.g. Crab) and making a copy is cumbersome (need to write code to make new index files)
- One observation class that's a proxy to on-disk data, another for completely in-memory. Mix not supported (e.g. simulate events).

## Use cases

- Do 1D or 3D analysis for part of a run (say 1 minute)
- Do 1D or 3D analysis for multiple runs (say 1 night or week)
- Make GPS survey maps for CTA (process 10 GB of events data and 3000 runs)

## What others have

With Fermi-LAT, you always have gtselect and gtmktime at the start:
https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/data_preparation.html
Later tools (bin, fit) partly rely on "data sub space" DSS header keys for processing.

In ctools, every analysis starts with an "observation definition file" and also by running ctselect
http://cta.irap.omp.eu/ctools/users/tutorials/1dc/first_select_obs.html


## Proposal

The new scheme we propose is this:

- The central class is `Observations`, that's the object passed to analysis classes like `MapMaker` or `SpectrumExtraction`.
- `Observations` holds a list of `Observation` objects
- `Observation` is a proxy object that uses a "store" object to access the data (events, irfs, ...)
- The "store" object can either be a `DataStore`, `XMLStore` or `ObservationStore` object. The first two hold the data on disk, the last in memory. All these "store" objects should have a similar API.
- This scheme is already followed by the `DataStoreObservation`/`DataStore` objects, with small changes we should be able to generlize this. So the new `Observation` object will be very similar to the current `DataStoreObservation`.
- The `Observation` class holds filters that are applied on-the-fly when the data is accessed
- The filtering is done by a dedicated class (`ObservationFilter`) which will be a data member of a the `Observation` class


Regarding GTI handling:

- The sum over gtis will provide the on time for the observation
- This means a time filter on the GTIs will automatically modify the on time as wanted 

(- We leave the current `DataStore`, `HDUIndexTable` and `ObservationTable` classes alone,
  they are one way to serialise observation definition information that we still support,
  but moving forward we will develop something better (YAML).)

See also this proposal:
https://gist.github.com/registerrier/010d023e97aaa87b851c15bf736408ad

Outline of the class attributes and methods:

In [1]:
class Observations:
    """
    Container of observations.
    
    Replaces the current `gammapy.data.ObservationList`.
    
    Parameters
    ----------
    obs_list : list
        Python list of observation
    """
    def __init__(self, obs_list):
        self.obs_list = obs_list
    
    def __str__(self):
        """Create summary"""
    
    @classmethod
    def from_obs_def_file(cls, filename):
        """Read from observation definition file.
        
        This is the format currently used for ctools:
        http://cta.irap.omp.eu/ctools/users/tutorials/1dc/first_select_obs.html
        """
        io = XMLStore.read(filename)
        return io.to_obs_list()

    @classmethod
    def from_index_files(cls, path):
        """Create from observation and HDU index files.
        
        This is the format defined here and used in Gammapy so far:
        https://gamma-astro-data-formats.readthedocs.io/en/latest/data_storage/index.html
        """
        ds = DataStore.from_dir(path)
        return ds.obs_list()
    
    def get_obs(self, obs_id):
        for obs in self.obs_list:
            if obs.obs_id == obs_id:
                return obs
        raise IndexError()
    
    def stack(self):
        """Stack all observations together.
        
        TODO: is this possible?
        """

    def select_time(self, time_interval):
        """Select subset of observations by time.
        
        - Creates a new `Observations` of `ObservationCTAProxy` objects
        - 
        """
        obs_list = []
        for obs in self.obs_list:
            obs = obs.select_time(time_interval)
            if len(obs.events.table):
                obs_list.append(obs)
        return Observations(obs_list)
    
    def write_to_disk(folder):
        """
        TODO: not sure how to tackle this. 
        It probably should apply the filters, can we choose the format?
    
        Should return a new Observations object
        """
        return obss

    
class Observation:
    """
    Proxy class to access one of the "store" classes (DataStore, XMLStore, MemoryStore)
    - One observation can access one events table, GTI table and IRFs.

    TODO: Should filters be a dedicated class or just a dictionary?
    """
    def __init__(self, obs_id, store=None, obs_filter=None):
        self.obs_id = obs_id
        self.store = store or MemoryStore(obs_id)
        self.obs_filter = obs_filter or ObservationFilter()

    def select_time(time_interval):
        """
        Returns a new obs with the filters updated
        
        time_interval : (astropy.time.Time, astropy.time.Time)
            start and stop time of the time interval
        """
        obs = self.copy()  # maybe we can just use python's copy.copy()?
        # TODO: support several time_intervals
        obs.obs_filter.time = {'min': time_interval[0], 'max': time_interval[1]}
        return obs
    
    def load_and_filter(data_str):
        """
        Loads the data and applies the filter
        """
        data = self.store.load(data_str)
        data_filtered = self.obs_filter.apply_on(data)
        return data_filtered
    
    @property
    def events(self):
        return self.load_and_filter('events')
    
    @property
    def gti(self):
        return self.load_and_filter('gti')
    
    @porperty
    def time_duration(self):
        return self.gti.time_sum()
        

class ObservationFilter(object):
    """
    Data selection specification for one observation.
    
    For now, we use simple dicts to represent the selections,
    e.g. `energy = {'min': '1 TeV'}`
    or `spatial = {'fov_max': '3 deg'}`
    in the future this might become a series of classes with serialisation,
    so that it can be stored in log and output files.
    """
    def __init__(self, energy=None, spatial=None, time=None, event_type=None, phase=None):
        self.energy = energy
        self.spatial = spatial
        self.time = time
        self.event_type = event_type
        self.phase = phase
        
    def filter_events(events):
        """
        Returns a new event list, with filters applied
        """
        filtered_events = events.select_time( (self.time['min'], self.time['max']) )
        return filtered_events
        
    def filter_gti(gti):
        """
        Returns a new gti table, with filters applied
        """
        pass
    
    def apply_on(data):
        if type(data) is EventList:
            return self.filter_events(data)
        elif type(data) is GTI:
            return self.filter_git(data)
        else:
            print('Cannot apply filter on {}'.format(type(data)))
    
class XMLStore:
    """
    One of the "server" classes that implements the XML observation definition list format:
    http://cta.irap.omp.eu/ctools/users/tutorials/1dc/first_select_obs.html
    """
    
class MemoryStore:
    """
    One of the "store" classes that holds everything in memory (useful for event simulations in the future).
    Should probably be very similar to the current ObservationCTA class
    """


## Use from analysis classes

Let's look at a few cases how the new observation classes will be used from the `MapMaker`, `SpectrumExtraction` and `LightCurveMaker`.

In [2]:
# Extract some data for a given energy, spatial and time selection
from gammapy.data import Observations

# This reads only the index files, not EVENT or IRF HDUs (TODO: what about GTI?)
# Contains `ObservationCTAProxy` objects
observations = Observations.from_index_files('$CTADATA/index/gps/')

# This would be equivalent, give the same results
observations = Observations.from_obs_def_file('$CTADATA/obs/obs_gps_baseline.xml')

# selection = ObservationDataSelection(
#     energy = {'min': '1 TeV'},
#     spatial = {'fov_max': '3 deg'},
#     time = {'min': '2018-10-02', 'max': '2018-10-10'},
# )
# Make new `Observations` container, are new `ObservationCTAProxy` objects,
# with the requested selections / transformations applied.
# Option A: simply store `selection` on the `observation` proxy objects
# Option B: use `selection['time']` to create new Observation objects?
# observations = observations.select(selection)

time_interval = (Time('2018-10-02'), Time('2018-10-10'))
observations = observations.select_time(time_interval)

# observations.get_obs(42).events

observations.write_data('myfolder')


ImportError: cannot import name 'Observations'

A preliminary light curve maker should be fairly easy to construct with the new filter functionality of the Observation objects.

In [None]:
class Analysis3D:
    """Run MapMaker and MapFit"""

class LightCurveMaker3D:
    def __init__(observations, time_intervals):
        self.observations = observations
        self.time_intervals = time_intervals

    def run(self):
        for time_interval in self.time_intervals:
            observations = self.observations.select_time(time_interval)


            # Analysis gets passed list of observations
            # Doesn't have to concern itself with GTI selections
            analysis = Analysis3D(observations)