# IACT DL3 data handling in Gammapy

The current IACT DL3 data handling and all analyses in Gammapy have some issues and limitations.

- No support for 1D or 3D analysis in time intervals, only full runs
- No support for event types
- Poor support for pulsar PHASE selection (only for 1D, in background estimator)
- No support for user-defined events selections (e.g. MC_ID)
- Creating data selection (e.g. Crab) and making a copy is cumbersome (need to write code to make new index files)
- One observation class that's a proxy to on-disk data, another for completely in-memory. Mix not supported (e.g. simulate events).

## Use cases

- Do 1D or 3D analysis for part of a run (say 1 minute)
- Do 1D or 3D analysis for multiple runs (say 1 night or week)
- Make GPS survey maps for CTA (process 10 GB of events data and 3000 runs)

## What others have

With Fermi-LAT, you always have gtselect and gtmktime at the start:
https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/data_preparation.html
Later tools (bin, fit) partly rely on "data sub space" DSS header keys for processing.

In ctools, every analysis starts with an "observation definition file" and also by running ctselect
http://cta.irap.omp.eu/ctools/users/tutorials/1dc/first_select_obs.html


## Proposal

The new scheme we propose is this:

- The central class is `Observations`, that's the object passed to analysis classes like `MapMaker` or `SpectrumExtraction`.
- `Observations` holds a of `Observation` objects
- We leave the current `DataStore`, `HDUIndexTable` and `ObservationTable` classes alone,
  they are one way to serialise observation definition information that we still support,
  but moving forward we will develop something better (YAML).

Outline of the class attributes and methods:

In [1]:

class Observations:
    """
    Container of observations.
    
    Replaces the current `gammapy.data.ObservationList`.
    
    Parameters
    ----------
    obs_list : list
        Python list of observation
    """
    def __init__(self, obs_list):
        self.obs_list = obs_list
    
    def __str__(self):
        """Create summary"""
    
    @classmethod
    def from_obs_def_file(cls, filename):
        """Read from observation definition file.
        
        This is the format currently used for ctools:
        http://cta.irap.omp.eu/ctools/users/tutorials/1dc/first_select_obs.html
        """
        io = ObservationsXMLIO.read(filename)
        return io.to_obs_list()

    @classmethod
    def from_index_files(cls, path):
        """Create from observation and HDU index files.
        
        This is the format defined here and used in Gammapy so far:
        https://gamma-astro-data-formats.readthedocs.io/en/latest/data_storage/index.html
        """
        ds = DataStore.from_dir(path)
        return ds.obs_list()
    
    def get_obs(self, obs_id):
        for obs in self.obs_list:
            if obs.obs_id == obs_id:
                return obs
        raise IndexError()
    
    def stack(self):
        """Stack all observations together.
        
        TODO: is this possible?
        """

    def select_time(self, time_interval):
        """Select subset of observations by time.
        
        - Creates a new `Observations` of `ObservationCTAProxy` objects
        - 
        """
        obs_list = []
        for obs in self.obs_list:
            obs = obs.select_time(time_interval)
            # TOD: maybe this filter isn't needed here
            if obs.has_data_in_selected_time():
                obs_list.append(obs)
        return obs_list
    
    
class Observation:
    """
    Base class, container for one observation.

    Current `ObservationCTA` is renamed to this `Observation`

    - One observation has one events table, GTI table and IRFs.
    - Current `DataStoreObservation` is renamed `ObservationCTAProxy`,
      and is decoupled from the `DataStore`, can also be initilised
      from XML observation definition lists.

    TODO: does this hold an `ObservationDataSelection` object?
    """
    def select_time(time_interval):
        obs = self.copy()
        obs.selection['time'] = merge_time_selection(self.selection['time'], time_interval)
        return obs
    
    @property
    def events(self):
        events = self.load('events')
        events.apply_selection(self.selection)
        return events
        

class ObservationDataSelection:
    """
    Data selection specification for one observation.
    
    For now, we use simple dicts to represent the selections,
    e.g. `energy = {'min': '1 TeV'}`
    or `spatial = {'fov_max': '3 deg'}`
    in the future this might become a series of classes with serialisation,
    so that it can be stored in log and output files.
    """
    def __init__(self, energy=None, spatial=None, time=None, event_type=None, phase=None):
        self.energy = energy
        self.spatial = spatial
        self.time = time
        self.event_type = event_type
        self.phase = phase
    
    
class ObservationsXMLIO:
    """
    Helper class to implement the XML observation definition list format:
    http://cta.irap.omp.eu/ctools/users/tutorials/1dc/first_select_obs.html
    """


## Use from analysis classes

Let's look at a few cases how the new observation classes will be used from the `MapMaker`, `SpectrumExtraction` and `LightCurveMaker`.

In [2]:
# Extract some data for a given energy, spatial and time selection
from gammapy.data import Observations

# This reads only the index files, not EVENT or IRF HDUs (TODO: what about GTI?)
# Contains `ObservationCTAProxy` objects
observations = Observations.from_index_files('$CTADATA/index/gps/')

# This would be equivalent, give the same results
observations = Observations.from_obs_def_file('$CTADATA/obs/obs_gps_baseline.xml')

# selection = ObservationDataSelection(
#     energy = {'min': '1 TeV'},
#     spatial = {'fov_max': '3 deg'},
#     time = {'min': '2018-10-02', 'max': '2018-10-10'},
# )
# Make new `Observations` container, are new `ObservationCTAProxy` objects,
# with the requested selections / transformations applied.
# Option A: simply store `selection` on the `observation` proxy objects
# Option B: use `selection['time']` to create new Observation objects?
# observations = observations.select(selection)

time_interval = {'min': '2018-10-02', 'max': '2018-10-10'}
observations = observations.select_time(time_interval)

# observations.get_obs(42).events

observations.write_data('myfolder', copy_everything=True)


ImportError: cannot import name 'Observations'

In [3]:
def gammapy_gtselect(observations, selection):
    """Like GtSelect, powered by gammapy.data.
    """
    for obs in observations.select(selection):
        obs.write(f'data_{obs.obs_id}.fits')

In [None]:
class Analysis3D:
    """Run MapMaker and MapFit"""

class LightCurveMaker3D:
    def __init__(observations, time_intervals):
        self.observations = observations
        self.time_intervals = time_intervals

    def run(self):
        for time_interval in self.time_intervals:
            observations = self.observations.select_time(time_interval)


            # Analysis gets passed list of observations
            # Doesn't have to concern itself with GTI selections
            analysis = Analysis3D(observations)