# 5. Analysis of two sources in the same field of view - 3D analysis
Author: Marcel Strzys (strzys@icrr.u-tokyo.ac.jp)

## 5.1. Context
We will repeat the same analysis as in the previous notebook, but using a three-dimensional analysis. This type of analysis uses the spatial information of the events and is particularly useful if we want to model different sources in the same field of view at once (not performing separate data reductions, as we did in the previous notebook). As we take into account the spatial distribution of the events, we can use this analysis to study the morphology of the sources.

In [None]:
# - basic imports (numpy, astropy, regions, matplotlib)
import logging
import warnings
from astropy.coordinates import SkyCoord
import astropy.units as u
from matplotlib import pyplot as plt
from matplotlib.offsetbox import AnchoredText
import numpy as np
from regions import PointSkyRegion, CircleSkyRegion
from scipy.stats import norm

# - Gammapy's imports
from gammapy.data import DataStore
from gammapy.datasets import Datasets
from gammapy.datasets import MapDataset

from gammapy.estimators import ExcessMapEstimator
from gammapy.estimators import FluxPointsEstimator, FluxPoints
from gammapy.estimators import TSMapEstimator

from gammapy.makers import FoVBackgroundMaker
from gammapy.makers import MapDatasetMaker
from gammapy.makers import SafeMaskMaker

from gammapy.irf import FoVAlignment, Background3D
from gammapy.maps import MapAxis
from gammapy.maps import WcsGeom

from gammapy.modeling import Fit
from gammapy.modeling.models import (
    PowerLawSpectralModel,
    FoVBackgroundModel,
    Models,
    PiecewiseNormSpectralModel,
    PointSpatialModel,
    SkyModel,
)

from plot_utils import plot_gammapy_sed

# - setting up logging and ignoring warnings
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

warnings.filterwarnings("ignore")

## 4.2. DL3 files for the 3D analysis

Let us proceed now with the data reduction. As we said, we are interested in preserving the spatial information of the events, which we loose in a one-dimensional analysis (once, we create our _on_ and _off_ regions, we do not care anymore about the position of the events inside these regions, and we just consider their distribution as a function of the energy). In a 3D analysis we create a three-dimensional histograms of the events spatial coordinates and energy, called a data _cube_. The spatial coordinates are represented, in the `FITS` standard by the so-called World Coordinate System (WCS), which we will also use to define the geometry of our data cube.

In [None]:
# data store
data_store = DataStore.from_dir(
    "../acme_magic_odas_data/data/1ES1218+304/full_enclosure"
)
observations = data_store.get_observations(required_irf=["aeff", "edisp", "psf", "bkg"])

# quick bugfix for Background3D IRF alignment issue
# see https://github.com/gammapy/gammapy/issues/3510
# and https://github.com/gammapy/gammapy/pull/4667
for obs in observations:
    new_bkg = Background3D(
        axes=obs.bkg.axes,
        data=obs.bkg.data,
        unit=obs.bkg.unit,
        meta=obs.bkg.meta,
        fov_alignment=FoVAlignment.REVERSE_LON_RADEC,
    )
    # Assign the new background IRF to the observation
    obs.bkg = new_bkg

Let us take a look at the observations we loaded, especially at the IRF provided for this type of analysis (the argument of the `required_irf` parameter of the `DataStore.get_observations` method).

In [None]:
observations[0].peek()

As for the previous notebook, and analysis, we see that the IRF are now multi-offset. But we notice that there are two new IRF components: the Point Spread Function (PSF) and the background. The PSF is the distribution function of the position estimator, and is needed to model the spatial distribution of the events. The background is instead needed to model the spatial distribution of the background events. Remember now we want to model the emission in each point of the field of view, thus we need to know how the background is distributed in the entire field of view, adn we cannot simply cut out a region to estimate it, as we did before. The background was pre-computed and attached to each of the DL3 files.

## 4.3. Data Reduction

We now create our data cube and fill it with the events of our observations.

In [None]:
# source coordinates
_1ES1215_coordinates = SkyCoord.from_name("1ES1215+303", frame="icrs")
_1ES1218_coordinates = SkyCoord.from_name("1ES1218+304", frame="icrs")

# energy axes
energy_min = 10 * u.GeV
energy_max = 1e5 * u.GeV
n_energy_est_bins = 20
n_energy_true_bins = 28

energy_axis = MapAxis.from_energy_bounds(
    energy_min,
    energy_max,
    n_energy_est_bins,
    per_decade=False,
    unit="GeV",
    name="energy",
)
energy_true_axis = MapAxis.from_energy_bounds(
    energy_min,
    energy_max,
    n_energy_true_bins,
    per_decade=False,
    unit="GeV",
    name="energy_true",
)

# spatial binning
width = (2.0, 2.0)
binsz = 0.02
npix = (int(width[0] / binsz), int(width[1] / binsz))

geom = WcsGeom.create(
    skydir=_1ES1215_coordinates,
    npix=npix,
    binsz=binsz,
    frame="icrs",
    proj="AIR",
    axes=[energy_axis],
)

As we aim to create a data cube, we will now use a `MapDataset`, instead of a `SpectrumDataset` as before.
The `MapDatasetMaker` will fill the counts cube and will also convert the irf components from camera coordinates into skycoordinates.

We define in addition a safe mask, which exludes all the regions beyond $2^{\rm circ}$ from the centre of our telescopes, as the irfs in that region are insufficiently sampled and thus not well determined.

In [None]:
empty_map_dataset = MapDataset.create(geom=geom)
datasets = Datasets()
maker = MapDatasetMaker(selection=["counts", "background", "psf", "edisp", "exposure"])
maker_safe_mask = SafeMaskMaker(methods=["offset-max"], offset_max=2.0 * u.deg)

for obs in observations:
    dataset = maker.run(empty_map_dataset.copy(), obs)
    dataset = maker_safe_mask.run(dataset, obs)
    datasets.append(dataset)

Let us take a look at the datasets we have created.

In [None]:
datasets[0].peek()

We can see the counts, background, excesses (counts - background), and exposure as a function of sky coordinates (all the energies are shown).

### 4.3.1. Adjusting the background
The background we embedded in the data should already be a reasonable estimation, but we can improve it by further adjusting the background model to each data set.
This is done by the `FoVBackgroundMaker`, that fits the background model to the data, excluding regions around known sources.

For this we want to exclude emission from sources from the normalisation of the background as it may bias the estimate. In both cases we choose a region of 0.2 deg, matching roughly the containment radius of a point source. This initial adjustment has not influence later when fitting the background again with the source to extract the spectrum, but is meant to result in better skymaps for the initial check for a source detection.

For further details on background estimation you can checkout these references: <br>
[Malyshev, Mohrmann (2023)](https://ui.adsabs.harvard.edu/abs/2023hxga.book..137M/abstract)   
[de Naruois (2021)](https://ui.adsabs.harvard.edu/abs/2021Univ....7..421D/abstract)   

In [None]:
excl_radius = 0.3 * u.deg

_1ES1215_circle = CircleSkyRegion(center=_1ES1215_coordinates, radius=excl_radius)
_1ES1218_circle = CircleSkyRegion(center=_1ES1218_coordinates, radius=excl_radius)
exclusion_regions = [_1ES1215_circle, _1ES1218_circle]

geom_image = geom.to_image()
exclusion_mask = ~geom_image.region_mask(exclusion_regions)

fov_background_maker = FoVBackgroundMaker(method="fit", exclusion_mask=exclusion_mask)

Now we loop over the datasets and adjust the background for each of them. For each we check the factor by which the background needs to be adjusted. If the factor is rather large/small, it suggests that the background model is not a good description of the data (either just the rate is very different or - more concerning - the shape does not match the data well). In this case we better exclude those runs. It should in general not apply to more than 10% of the runs or one should revise the method used for the background model creation (not part of this tutorial).

In [None]:
datasets_fitted_background = Datasets()

for dataset in datasets:
    dataset = fov_background_maker.run(dataset)
    fit_result = dataset.models.to_parameters_table()
    norm_fit = fit_result[fit_result["name"] == "norm"]["value"][0]
    if (norm_fit > 1.5) or (norm_fit < 0.5):
        logger.warning(
            f"Fit for dataset {dataset.name} exceeds recommended limits. The norm of the fit is {norm_fit}."
        )
        continue
    datasets_fitted_background.append(dataset)

As we already seen, there are two different analysis approaches: a stacked and a joint analysis. For the stacked we combine the observations into a single dataset. For the joint we keep the datasets sepearately and fit the models to the separate datasets via a joint likelihood fit. The stacked method is faster, whereas the joint fit should be more precise as it better considers subtle differences between the runs. The stacked method might be helpful for a 3D analysis case, to help reduce computation time (now we have to predict counts not only in energy bins, but also in spatial bins).

In [None]:
dataset_stacked = datasets_fitted_background.stack_reduce()

Now we can plot the counts, background, and excess (counts -  background). It gives you a first impression how well the maps are populated and whether a potential source is visible in the maps.

The maps plotted by gammapy are interactive, so you can select the scaling for the colorscale and the energy bin.

In [None]:
smoothing_radius = 0.1 * u.deg

dataset_stacked.counts.smooth(smoothing_radius).plot_interactive(add_cbar=True)

dataset_stacked.exposure.smooth(smoothing_radius).plot_interactive(add_cbar=True)

dataset_stacked.background.smooth(smoothing_radius).plot_interactive(add_cbar=True)

dataset_stacked.excess.smooth(smoothing_radius).plot_interactive(
    stretch="sqrt", add_cbar=True
)
plt.show()

We can see that the center of the observation was more towards 1ES1218+304, as we can see higher values in the counts, background and exposure maps.
And indeed, from the previous notebook, we remember that the wobbling observation was performed around 1ES1218+304's coordinates.
We can also observe that the excess map already shows an enhanced emission from 1ES1218+304 and 1ES1215+303.

### 4.4 Detection maps

Now we can quantify the excess in the map using significance estimators. Gammapy provides two different approaches an `ExcessMapEstimator`, which estimates the source significance based on the counts similar to the 1D case or a `TSMapEstimator`, which fits a defined source model in each pixel of the map and determines the significance for a hypothetical source from the likelihood (this is the same approach the _Fermi_-LAT analysis uses).

In [None]:
# here a quick function to add the sources coordinates to the maps


def plot_sources_coords(ax, wcs):
    """Add sources positions to significance and excess maps"""
    source1_coords = PointSkyRegion(_1ES1215_coordinates)
    source1_coords.to_pixel(wcs).plot(
        ax=ax,
        color="forestgreen",
        marker="*",
        ls="",
        markersize=12,
        label="1ES1215+303",
    )

    source2_coords = PointSkyRegion(_1ES1218_coordinates)
    source2_coords.to_pixel(wcs).plot(
        ax=ax, color="dodgerblue", marker="*", ls="", markersize=12, label="1ES1218+304"
    )

    ax.legend()

In [None]:
# to maximise the significance for a point source,
# the correlation radius should be of the size of the PSF.
# for extended sources try sqrt(psf**2+extension**2), which
# assumes a Gaussian
correlation_radius = 0.07 * u.deg
estimator = ExcessMapEstimator(
    correlation_radius, selection_optional=[], correlate_off=True
)
lima_maps = estimator.run(dataset_stacked)

significance_map = lima_maps["sqrt_ts"]
excess_map = lima_maps["npred_excess"]

# We can plot the excess and significance maps
fig, (ax1, ax2) = plt.subplots(
    figsize=(11, 4), subplot_kw={"projection": lima_maps.geom.wcs}, ncols=2
)
ax1.set_title("Significance map")
significance_map.plot(ax=ax1, vmax=7, add_cbar=True)
plot_sources_coords(ax1, lima_maps.geom.wcs)

ax2.set_title("Excess map")
excess_map.plot(ax=ax2, add_cbar=True)
plot_sources_coords(ax2, excess_map.geom.wcs)

plt.show()

And we can cleary see that the excesses correspond to the sources position, but to check whether the results are robust, it is worth to check the significance distribution of pixels. The background-only pixel should form a Gaussian with mean 0 and sigma 1 - at least approximately. If there are multiple sources in the FoV or your source extens over a significant fraction of the map, this assumption may not hold.

In [None]:
# all values of significance
significance_all = lima_maps["sqrt_ts"].data[np.isfinite(lima_maps["sqrt_ts"].data)]
# all values of significance outside exclusion regions, i.e. excluding the regions around the sources
significance_off = lima_maps["sqrt_ts"].data[
    np.logical_and(np.isfinite(lima_maps["sqrt_ts"].data), exclusion_mask.data)
]
bins = np.linspace(
    np.min(significance_all),
    np.max(significance_all),
    num=int((np.max(significance_all) - np.min(significance_all)) * 3),
)

fig, ax = plt.subplots()
ax.hist(
    significance_all,
    density=True,
    alpha=0.5,
    color="red",
    label="all bins",
    bins=bins,
)

ax.hist(
    significance_off,
    density=True,
    alpha=0.5,
    color="blue",
    label="background bins",
    bins=bins,
)

# Now, fit the off distribution with a Gaussian
mu, std = norm.fit(significance_off)
x = np.linspace(-8, 8, 50)
p = norm.pdf(x, mu, std)
ax.plot(x, p, lw=2, color="black")
ax.legend()
ax.set_xlabel("Significance")
ax.set_yscale("log")
ax.set_ylim(1e-5, 1)
xmin, xmax = np.min(significance_all), np.max(significance_all)
ax.set_xlim(xmin, 7)
text = r"$\mu$ = {:.2f}" f"\n" r"$\sigma$ = {:.2f}".format(mu, std)
box_prop = dict(boxstyle="Round", facecolor="white", alpha=0.5)
text_prop = dict(fontsize="x-large", bbox=box_prop)
txt = AnchoredText(text, loc=2, transform=ax.transAxes, prop=text_prop, frameon=False)
ax.add_artist(txt)

print(f"Fit results: mu = {mu:.2f}, std = {std:.2f}")
plt.show()

Now we are going to use TS estimator, for which we first need to define a model for our test source. As we deal here with point sources, we choose a point source for the spatial model and for the spectrum we use a power-law for simplicity. The closer your model choice is to reality the better your results will be.

In [None]:
spatial_model = PointSpatialModel()
# we choose units consistent with the map units here...
spectral_model = PowerLawSpectralModel(amplitude="1e-22 cm-2 s-1 keV-1", index=2)
model = SkyModel(spatial_model=spatial_model, spectral_model=spectral_model)

In [None]:
estimator = TSMapEstimator(
    model,
    kernel_width="0.07 deg",
    energy_edges=[80, 8000] * u.GeV,
)
maps = estimator.run(dataset_stacked)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(
    ncols=3,
    figsize=(15, 3),
    subplot_kw={"projection": geom.wcs},
    gridspec_kw={"left": 0.1, "right": 0.98},
)

maps["sqrt_ts"].plot(ax=ax1, vmax=7, add_cbar=True)
ax1.set_title("Significance map")
maps["flux"].plot(ax=ax2, add_cbar=True, stretch="sqrt", vmin=0)
ax2.set_title("Flux map")
maps["niter"].plot(ax=ax3, add_cbar=True)
ax3.set_title("Iteration map")

Also this procedure clearly detected the two sources. The advantage of this method is that it also provides a map in physical meaningful units, the flux map.

## 4.5 3D spectrum fit

For the 3d fit we need to define a model for the sources. Each model consists of two components, a spatial and a spectral. The spatial component in case of the the two blazars are simple as it is a point source for the resolution of the IACTs. Hence, we only need to define the position. For the spectral model we use power-law models following the findings of 1D analyses. We then combine the components into one source model for either source.

In [None]:
_1ES1215_spatial_model = PointSpatialModel(
    lon_0=_1ES1215_coordinates.ra, lat_0=_1ES1215_coordinates.dec, frame="icrs"
)

_1ES1218_spatial_model = PointSpatialModel(
    lon_0=_1ES1218_coordinates.ra, lat_0=_1ES1218_coordinates.dec, frame="icrs"
)

# let us copy the spectral models from those we obtained with the 1D analysis
# so we take also the EBL into account, we now add a reals spatial model
_1ES1215_model_1d = Models.read("results/1ES1215+303/model_1ES1215+303.yaml")
_1ES1215_model_3d = _1ES1215_model_1d[0].copy()
_1ES1215_model_3d._name = "1ES1215+303"
_1ES1215_model_3d.spatial_model = _1ES1215_spatial_model

_1ES1218_model_1d = Models.read("results/1ES1218+304/model_1ES1218+304.yaml")
_1ES1218_model_3d = _1ES1218_model_1d[0].copy()
_1ES1218_model_3d._name = "1ES1218+304"
_1ES1218_model_3d.spatial_model = _1ES1218_spatial_model

# let us check the models now
print(_1ES1215_model_3d)
print(_1ES1218_model_3d)

And we now see that also the coordinates have become parameters of the model.

For the fit to perform better, we can constrain the values of each parameter to reasonable ranges. 
It will deliver better results and makes the fit converging faster, but requires that these ranges are known apriori.

In [None]:
# we actually freeze the sources coordinates
_1ES1215_model_3d.parameters["lon_0"].frozen = True
_1ES1215_model_3d.parameters["lat_0"].frozen = True

_1ES1218_model_3d.parameters["lon_0"].frozen = True
_1ES1218_model_3d.parameters["lat_0"].frozen = True

We now can define the energy range in true energy in which we like to fit the model. This does not agree with the energy range in which we extract the data and irfs. As a rule of thumb the true energy range can be a bit wider than the range in reconstructed energy. Here in addition we define the source region, in which case we only fit the source.

Alternatively, we can only define the energy mask and then fit the source and background together across the FoV.

In [None]:
energy_mask = dataset_stacked.counts.geom.energy_mask(
    energy_min=0.08 * u.TeV, energy_max=20 * u.TeV
)
dataset_stacked.mask_fit = energy_mask

### 4.5.2. Adjusting the Background model

In case we fit also the background we need to define a model for it as well. If we do not fit the background, the background model is just subtrackted and the source is fit on the excess map. 

The spatial model is given by the background model. For the spectrum, we can either assume that the background is scaled correctly across the bins and just apply a single scaling factor across all energy bins or we allow a different normalisation factor for each energy bin. The former is faster, but the latter should be more precise.

In [None]:
energy_center = dataset_stacked.geoms["geom"].axes["energy"].center

spectral_model_bkg = PiecewiseNormSpectralModel(
    energy=energy_center,
    norms=np.ones(energy_center.shape),
)
bkg_model = FoVBackgroundModel(
    dataset_name="stacked", spectral_model=spectral_model_bkg
)

Now we define the final model to be fit to the data. In case we restricted the spatial fit region, one should better only use the source model. In case you use the entire FoV, the background can be fit as well. However, the fit of the background performed earlier should have already sufficiently optimized the background enough in this case. In case you have a complex region with several sources, we recommend to fit the background.

In [None]:
total_model_3d = Models([_1ES1218_model_3d, _1ES1215_model_3d, bkg_model])
dataset_stacked.models = total_model_3d

Finally we can run the fit using the minuit minimizer. We can defint he output level and settings for the minimizer. We save the outcome of the fit to a seperate object as well as the report from the minimizer.

In [None]:
fit = Fit(optimize_opts={"tol": 0.01, "strategy": 2, "print_level": 0})
result = fit.run(datasets=dataset_stacked)
minuit_result = result.optimize_result.minuit

print(result)
print(minuit_result)

Please always check whether the fit finished successfully. If a fit fails, please check whether the starting parameters of your input model are reasonable. It can also improve the stability of the fit to freeze or constrain some parameters. For better readability, we can also print the final results for the source parameters in separate table.

In [None]:
display(total_model_3d.to_parameters_table())

In [None]:
# compute flux points
energy_edges = energy_axis.edges[5:13]
flux_points_estimator_1ES1215 = FluxPointsEstimator(
    energy_edges=energy_edges, source="1ES1215+303", selection_optional="all"
)
flux_points_1ES1215_3d = flux_points_estimator_1ES1215.run(datasets=dataset_stacked)

energy_edges = energy_axis.edges[4:14]
flux_points_estimator_1ES1218 = FluxPointsEstimator(
    energy_edges=energy_edges, source="1ES1218+304", selection_optional="all"
)
flux_points_1ES1218_3d = flux_points_estimator_1ES1218.run(datasets=dataset_stacked)

### 4.5.3. Checking the fit results

To check the agreement between the fit model and the data, we should check the residuals in the source region. If significant residuals remain the model does not describe the data very well. In our case we see that the difference between model and data are within 20%.

In [None]:
dataset_stacked.plot_residuals_spatial(method="diff/sqrt(model)", vmin=-0.5, vmax=0.5)
plt.show()

For each source region one can also the data model agreement across the different energy bins to see in which energy bins one has the biggest descripancy.

In [None]:
region = CircleSkyRegion(_1ES1215_model_3d.position, radius=0.15 * u.deg)
dataset_stacked.plot_residuals(
    kwargs_spatial=dict(method="diff/sqrt(model)", vmin=-1, vmax=1),
    kwargs_spectral=dict(region=region),
)
plt.show()

We can further study the spatial structure of the residual in different energy bins using the excess map estimator of gammapy again. This will show us significant residuals over our best fit model in the map.

In [None]:
estimator = ExcessMapEstimator(
    correlation_radius="0.05 deg",
    selection_optional=[],
    energy_edges=[0.1, 1, 10] * u.TeV,
)

result = estimator.run(dataset_stacked)
result["sqrt_ts"].plot_grid(
    figsize=(12, 4), cmap="coolwarm", add_cbar=True, vmin=-5, vmax=5, ncols=2
)
plt.show()

For low energies, we see an positive excess at the position of 1ES1218. This can happen for bright sources at low energies as the Monte-Carlo based PSF does not describe the tails in the real data well.

## 4.6. Comparison of 1D vs 3D results

In addition to the spectral fit, we can estimate flux data points from the data.   
It would be most interesting to compare the results of the 1D and 3D analyses. Even if we used different data reductions, and different background estimation methods, they should agree within the uncertainties.

In [None]:
# load 1D flux points for comparison
flux_points_1ES1218_1d = FluxPoints.read(
    "results/1ES1218+304/flux_points_1ES1218+304.fits"
)
flux_points_1ES1215_1d = FluxPoints.read(
    "results/1ES1215+303/flux_points_1ES1215+303.fits"
)

In [None]:
fig, ax = plt.subplots()

plot_gammapy_sed(
    ax,
    _1ES1218_model_1d["1ES1218+304"].spectral_model,
    flux_points_1ES1218_1d,
    "crimson",
    "1ES1218+304, 1D analysis",
)
plot_gammapy_sed(
    ax,
    total_model_3d["1ES1218+304"].spectral_model,
    flux_points_1ES1218_3d,
    "rosybrown",
    "1ES1218+304, 3D analysis",
)

plot_gammapy_sed(
    ax,
    _1ES1215_model_1d["1ES1215+303"].spectral_model,
    flux_points_1ES1215_1d,
    "dodgerblue",
    "1ES1215+303, 1D analysis",
)
plot_gammapy_sed(
    ax,
    total_model_3d["1ES1215+303"].spectral_model,
    flux_points_1ES1215_3d,
    "lightblue",
    "1ES1215+303, 3D analysis",
)

ax.legend()
ax.set_xlim([60, 2e4])
plt.show()