# Datasets - Reduced data, IRFs, models 

## Introduction

`gammapy.datasets` are a crucial part of the gammapy API. `datasets` constitute `DL4` data - binned counts, IRFs, models and the associated likelihoods. `Datasets` from the end product of the `makers` stage, see [makers notebook](makers.ipynb), and are passed on to the `Fit` or estimator classes for modelling and fitting purposes.

There are three types of `Dataset` available
- `MapDataset`: Binned counts and IRFs on a `WCSGeom` spatial geom and an energy axis; used for 3D analysis, supports cash likelihood
- `SpectrumDataset`: Binned counts and IRFs on a `RegionGeom` and energy axis; used for 1D spectral analysis; supports cash likelihood
- `FluxPointDataset`: Directly supports fitting of pre-computed flux points, no IRF convolution is performed during fitting, and supports `chi2` statistics

`MapDataset` and `SpectrumDataset` in turn support `MapDatasetOnOff` and `SpectrumDatasetOnOff`, respectively, which use `wstat` statistics and should be used when the background is estimated from real off counts. These store, additionally, the `counts_off`, `acceptance` and `acceptance_off data`.


## Setup

In [None]:
import numpy as np
import astropy.units as u
from regions import CircleSkyRegion
from astropy.coordinates import SkyCoord
from gammapy.datasets import (
    MapDataset,
    SpectrumDataset,
    Datasets,
    FluxPointsDataset,
)
from gammapy.data import DataStore
from gammapy.maps import WcsGeom, RegionGeom, MapAxes, MapAxis, Map
from gammapy.modeling.models import SkyModel, PowerLawSpectralModel
from gammapy.estimators import FluxPoints

## MapDataset

The counts, exposure, background, masks, and IRF maps are bundled together in a data structure named `MapDataset`. While the `counts`, and `background` maps are binned in reconstructed energy must have the same geometry, the IRF maps can have a different spatial (coarsely binned and larger) geometry and spectral range (binned in true energies). It is usually recommened that the true energy bin should be larger and more finely sampled and the reco energy bin.

An empty `MapDataset` can be instantiated from any `WcsGeom` object. Binnings of the all IRF axes can be individually configued, otherwise, internal defaults will be selected.

In [None]:
energy_axis = MapAxis.from_energy_bounds(
    1, 10, nbin=11, name="energy", unit="TeV"
)

geom = WcsGeom.create(
    skydir=(83.63, 22.01),
    axes=[energy_axis],
    width=5 * u.deg,
    binsz=0.05 * u.deg,
    frame="icrs",
)

energy_axis_true = MapAxis.from_energy_bounds(
    0.1, 100, nbin=11, name="energy_true", unit="TeV", per_decade=True
)

rad_axis = MapAxis.from_bounds(0, 5, nbin=50, unit="deg", name="rad")

dataset_empty = MapDataset.create(
    geom=geom,
    energy_axis_true=energy_axis_true,
    rad_axis=rad_axis,
    binsz_irf=0.1,
)

To see the geometry of each map, we can

In [None]:
dataset_empty.geoms

To see how to use the `dataset_empty` in the data reduction process, please see the [makers notebook](makers.ipynb)

### Reading and write datasets

Datasets can be read from and saved to disc using the `read` and write commands, eg:

In [None]:
dataset = MapDataset.read(
    "$GAMMAPY_DATA/cta-1dc-gc/cta-1dc-gc.fits.gz", name="test"
)

In [None]:
# To access the individual components of a dataset, eg counts, you can simply
dataset.background

`Dataset.background` contains the background map computed from the IRF.
To see the model corrected background, use `dataset.npred_background()`

To explore the contents of a `Dataset`, you can simply

In [None]:
print(dataset)

In [None]:
# For a quick info, use
dataset.info_dict()

### Using masks

There are two masks that can be set on a `Dataset`, `mask_safe` and `mask_fit`. 

- The `mask_safe` is computed during the data reduction process according to the specified selection cuts, and should not be changed by the user.
- During modelling and fitting, the user might want to additionally ignore some parts of a reduced dataset, e.g. to restrict the fit to a specific energy range or to ignore parts of the region of interest. This should be done bu applying the `mask_fit`. To details of applying masks, please refer to [Masks-for-fitting](mask_maps.ipynb#Masks-for-fitting:-mask_fit)

Both the `mask_fit` and `mask_safe` must have the safe `geom` as the `counts` and `background` maps.

In [None]:
# eg: to see the safe data range
dataset.mask_safe.plot_interactive(add_cbar=True);

In [None]:
# To apply a mask fit - in enegy and space
region = CircleSkyRegion(
    SkyCoord(2.1, 1.5, unit="deg", frame="galactic"), 0.7 * u.deg
)
mask_space = dataset.geoms["geom"].region_mask([region], inside=False)
mask_energy = dataset.geoms["geom"].energy_mask(0.6 * u.TeV, 4 * u.TeV)
mask = mask_space & mask_energy  # standard binary operations allowed on masks
dataset.mask_fit = mask
dataset.mask_fit.plot_grid();

To see the allowed energy ranges, you can use
- `dataset.energy_range_safe` : energy range allowed by the `mask_safe`
- `dataset.energy_range_fit` : energy range allowed by the `mask_fit`
- `dataset.energy_range` : the final energy range used in likelihood computation

These methods return two maps, with the `min` and `max` energy values at each spatial pixel

In [None]:
dataset.energy_range

In [None]:
dataset.energy_range[0].plot(add_cbar=True)

## SpectrumDataset

`SpectrumDataset` inherits from a `MapDataset`, and is specially adapted for 1D spectral analysis, and uses a `RegionGeom` instead of a `WcsGeom`. 
A `MapDatset` can be converted to a `SpectrumDataset`, by summing the `counts` and `background` inside the `on_region`, which can then be used for classical spectral analysis. Containment correction is feasible only for circular regions.

In [None]:
on_region = CircleSkyRegion(
    SkyCoord(0, 0, unit="deg", frame="galactic"), 0.5 * u.deg
)
spectrum_dataset = dataset.to_spectrum_dataset(
    on_region, containment_correction=True
)

In [None]:
spectrum_dataset.peek()

A `MapDataset` can also be integrated over the `on_region` to create a `MapDataset` with a `RegionGeom`. Complex regions can be handled and since the full IRFs are used, containment correction is not required. 

In [None]:
reg_dataset = dataset.to_region_map_dataset(on_region, name="RegionMapDS")
print(reg_dataset)

## FluxPointsDataset

`FluxPointsDataset` is a `Dataset` container for precomputed flux points, which can be then used in fitting.
`FluxPointsDataset` cannot be read directly, but should be read through `FluxPoints`, with an additional `SkyModel`. Similarly, `FluxPointsDataset.write` only saves the `data` component to disc. 

In [None]:
flux_points = FluxPoints.read(
    "$GAMMAPY_DATA/tests/spectrum/flux_points/diff_flux_points.fits"
)
model = SkyModel(spectral_model=PowerLawSpectralModel())
fp_dataset = FluxPointsDataset(data=flux_points, models=model)

The masks on `FluxPointsDataset` are `np.array` and the data is a `FluxPoints` object. The `mask_safe`, by default, masks the upper limit points

In [None]:
fp_dataset.mask_safe

In [None]:
fp_dataset.data_shape()  # number of FluxPoints

In [None]:
fp_dataset.stat_type  # uses chi2 statistics

## Datasets

`Datasets` are a collection of `Dataset` objects. They can be of the same type, or of different types, eg: mix of `FluxPointDataset`, `MapDataset` and `SpectrumDataset`.

- As an example of using different types of datasets, please see the [joint fitting tutorial](../3D/analysis_mwl.ipynb)
- To see how multiple models interact with multiple datasets, see, [model management](model_management.ipynb)