Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIG 9 - Event sampling #2136

Merged
merged 51 commits into from
Aug 30, 2019
Merged
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
09b38d9
Creates event sample PIG
Feb 4, 2019
48dffea
Further improve to the event sampler PIG
fabiopintore Feb 5, 2019
955e765
Further improve to the event sampler PIG
fabiopintore Feb 5, 2019
297a587
Further improve to the event sampler PIG
fabiopintore Feb 15, 2019
da79169
update of the PIG-009
fabiopintore Apr 1, 2019
16e2bff
update of the PIG-009
fabiopintore Apr 1, 2019
14e036f
update of the PIG-009
fabiopintore Apr 1, 2019
edc8466
update of the PIG-009
fabiopintore Apr 1, 2019
84856ad
update of the PIG-009
fabiopintore Apr 1, 2019
5cb248c
update of the PIG-009
fabiopintore Apr 1, 2019
ee5364d
Add API proposal and add line breaks
adonath Apr 5, 2019
8a09d35
update of the PIG-009
fabiopintore Apr 5, 2019
ceb383f
update of the PIG-009
fabiopintore Apr 8, 2019
3329951
Fix formatting for code cells.
adonath Apr 23, 2019
24a9e69
update of the PIG-009
fabiopintore Apr 24, 2019
a3bf937
Minor clean-up and polishing
adonath May 2, 2019
218e55c
update of the PIG-009
fabiopintore May 2, 2019
ff93d7e
Further clean-up and improvements to PIG-009
adonath May 2, 2019
79a216e
update of the PIG-009
fabiopintore May 9, 2019
6be200a
More information about the binned analysis in the PIG-009
fabiopintore Jun 6, 2019
1b5d122
Update pig-009.rst
fabiopintore Jun 14, 2019
1241fbb
Update pig-009.rst
fabiopintore Jun 19, 2019
de9c875
Improve event sampling PIG wording and description of details
adonath Jun 25, 2019
953c4bf
Improve list of proposed PRs
adonath Jun 25, 2019
97d3cd2
Improve PIG text
adonath Jun 26, 2019
e010b2b
Update pig-009.rst
fabiopintore Jul 16, 2019
6e24328
Update pig-009.rst
fabiopintore Jul 16, 2019
50091eb
Update pig-009.rst
fabiopintore Jul 17, 2019
f9467f8
Update pig-009.rst
fabiopintore Jul 17, 2019
680af55
Reshape and re-writing of the PIG.
fabiopintore Jul 17, 2019
6d311cb
Update pig-009.rst
fabiopintore Jul 17, 2019
95176ec
Update pig-009.rst
fabiopintore Jul 17, 2019
8de5f51
Update pig-009.rst
fabiopintore Jul 17, 2019
adc2ed9
Update pig-009.rst
fabiopintore Jul 17, 2019
d143ec8
Update pig-009.rst
fabiopintore Jul 17, 2019
f1c17a8
Update pig-009.rst
fabiopintore Jul 18, 2019
725bbbd
Update pig-009.rst
fabiopintore Jul 18, 2019
6ac4c5d
Update pig-009.rst
fabiopintore Jul 18, 2019
16e22b5
Update pig-009.rst
fabiopintore Jul 24, 2019
d4398cb
Axel comments implemented
fabiopintore Jul 24, 2019
4cb584d
Update pig-009.rst
fabiopintore Jul 24, 2019
ea1b942
Address some more comments
adonath Jul 27, 2019
bd5e7cc
Addressed Regis' comments
fabiopintore Jul 28, 2019
6e50dae
The comments from Lars and Christoph were included
fabiopintore Aug 6, 2019
930e6ad
Update pig-009.rst
fabiopintore Aug 6, 2019
582af75
Minor changes
fabiopintore Aug 6, 2019
6317106
Update pig-009.rst
fabiopintore Aug 6, 2019
d664b15
Update pig-009.rst
fabiopintore Aug 30, 2019
41a86dd
Update pig-009.rst
fabiopintore Aug 30, 2019
20fa772
Minor polishing of PIG 9
adonath Aug 30, 2019
895a74e
Change PIG 9 status to accepted
adonath Aug 30, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
247 changes: 247 additions & 0 deletions docs/development/pigs/pig-009.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
.. include:: ../../references.txt

.. _pig-009:

**********************
PIG 9 - Event Sampling
**********************

* Author: Fabio Pintore, Andrea Giuliani, Axel Donath
* Created: May 03, 2019
* Accepted: August 30, 2019
* Status: accepted
* Discussion: `GH_2136`_

Abstract
========

An event sampler for gamma events is an important part of the science tools
of the future Cherenkov Telescope Array (CTA) observatory. It will allow users
to simulate observations of sources with different spectral, morphological and
temporal properties and predict the performance of CTA on the simulated events
e.g. to support observation proposals or study the sensitivity of future
observations. For this reason, we propose to implement a framework for event
simulation in Gammapy.

The proposed framework consists of individual building blocks, represented by
classes and methods, that can be chained together to achieve a full simulation of an event
list corresponding to a given observation. This includes the simulation of source
events, background events, effects of instrument response functions (IRF) and
arrival times. As underlying method for the actual event sampling we propose to
use inverse cumulative distribution function (CDF) sampling (inverseCDF_) with finely
binned discrete source and background. Temporal models will be also taken into account
and time will be sampled separately in a 1D analysis, assuming that the temporal
dependency of the input source models factorizes.


Sampling methods
====================
Inverse CDF sampling (inverseCDF_) is an established method to sample from discrete
probability mass functions. It is used by ``ASTRIsim`` (astrisim_), the event simulator
of the AGILE collaboration. However it is not the method of choice for other existing
event samplers such as the the Fermi-LAT Science Tools (gtobsim) and CTOOLS (ctobssim).
The latter uses a combination of analytical sampling for models, where a solution is
known (e.g. power-laws) and the rejection sampling method (rej_sampl_), where the sampling
has to be done numerically (see an example here gammalib_).
cdeil marked this conversation as resolved.
Show resolved Hide resolved

As rejection sampling can directly sample from continuous probability density
functions, it is expected to yield very precise results. However an enveloping
distribution is needed, which should be adapted to the target distribution
to be efficient (see also `rejection sampling in Python`_ for an example implementation),
otherwise a lot of computation time is spend in rejecting drawn samples.

For this reason we favour the inverse CDF sampling method, as a simple to implement
cdeil marked this conversation as resolved.
Show resolved Hide resolved
and general sampling method.
The precision of the inverse CDF sampling method can be controlled by the resolution
of the input probability mass function (PMF) and is in practice only limited by the
available memory. We will study the required bin-size of the PMFs to reach sufficient
precision. If we find the inverse CDF sampling method to be not precise enough, it is
still possible to achieve better precision adopting the rejection sampling. This will
not have a strong impact on the structure of the event-sampler.


Proposal
========

We propose to include in ``gammapy.cube`` an high level interface (HLI) class, labelled as
``MapDatasetEventSampler`` or ``MapDataset.sample`` method. This class handles the complete
event sampling process, including the corrections related to the IRF and source temporal
variability, for a given GTI / observation.

The dataset will be computed using the standard data reduction procedure of Gammapy, as
illustrated in the following example:


::

obs = Observation(pointing, gti, aeff, psf, edisp, expomap)
maker = MapDatasetMaker(geom, geom_irf, ...)
dataset = maker.run(obs)
model = SkyModels.read("model.yaml")
dataset.model = model
sampler = MapDatasetEventSampler(dataset)
events = sampler.sample()
events.write()


After data reduction, the Dataset object should contain all the needed information,
such as the pointing sky coordinates, the GTI, and the setup of all the models (spectra,
spatial morphology, temporal model) for any given source, and it is passed
as input parameter to the ``MapDatasetEventSampler``. It is important to note that the
``MapDataset`` object can store information for more than one source. Then, a ``.sample`` method
will draw the sampled events and will provide an output ``~astropy.table.Table`` object. The
latter will containt the reconstructed sky positions, energies, times, and an ``EVENT_ID`` and
``MC_ID``. ``EVENT_ID`` is a unique number or a string to identify the sampled event, while
``MC_ID`` is a unique ID (number or string) to identify the model component the event was sampled
from. The ``MapDatasetSampler`` should also fill the mandatory header information for event list
files decribed on `gamma-astro-data-formats`_.

MapDatasetEventSampler
----------------------

The general design of the ``sample`` method is as follows:

::

def sample(dataset, random_state)
"""Sample events from a ``MapDataset``"""

events_list = []

for evaluator in dataset.evaluators:
npred = evaluator.compute_npred()
n_events = random_state.poisson(npred.data.sum())
events = npred.sample(n_events, random_state)
time = LightCurveTableModel.sample(n_events=, lc=, random_state=)
events = hstack(events,time)
events_list.append(events)
event_list["MC_ID"] = evaluator.model.name

events_src = vstack(events_list)
events_src = dataset.psf.sample(events_src, random_state)
events_src = dataset.edisp.sample(events_src, random_state)

n_events_bkg = random_state.poisson(dataset.background_model.map.data.sum())
events_bkg = dataset.background_model.sample(n_events, random_state)

events_total = vstack([events_src, events_bkg])
events_total.meta = get_events_meta_data(dataset)
return EventList(events_total)


In more detail, ``sample`` starts a loop over the sources stored into the ``MapDataset``
model. Then, for each source, the ``src.compute_npred`` method will calculate the predicted
number of source counts ``npred``. In particular, it is important to note that
``npred = exposure * flux``, where ``exposure`` is defined as ``effective_area * exposure_time``.
``npred`` is therefore calculated irrespective of the energy dispersion and of PSF.
Then, ``npred`` will be the input of the ``npred.sample`` method. The latter uses a Poisson
distribution, with mean equal to the predicted counts, to estimate the random number of sampled
events.

We propose to add a ``Map.sample(n_events=, random_state=)`` method in ``~gammapy.maps.Map``
that will be the core of the sampling process. The ``sample`` is based on the
``~gammapy.utils.random.InverseCDFSampler`` class described in `GH_2229`_ . The output
will be an ``~astropy.table.Table`` with columns: ``RA_TRUE``, ``DEC_TRUE`` and ``ENERGY_TRUE`` .

Then, the time will be sampled independently using the temporal information stored into
the ``MapDataset`` model for each source of interest. This will be done through a
``.sample(n_events=, random_state=)`` method that we propose to add
to ``~gammapy.time.models.LightCurveTableModel`` and ``~gammapy.time.models.PhaseCurveTableModel``.
This method will take as input the GTIs (i.e. one Tstart and Tstop) in the ``MapDataset``
object. Also in this case the ``InverseCDFSampler``
class is the machine used to sample the time of the events. In the case the temporal
model is not provided, the time is uniformly sampled in the time range ``t_min`` and ``t_max``.
To define a light-curve per model component, the current ``SkyModel`` class will be
extended by a ``SkyModel(..., temporal_model=)``.

The IRF correction can now be applied to sampled events. We propose to add a
``.sample(events=)`` method in both ``~gammapy.cube.PSFMap`` and ``~gammapy.cube.EdispMap``.
The method interpolates the "correct" IRF at the position of a given event and
applies it. In more detail, the method calculates the psf and the energy dispersion at
the events true positions and true energies, which are given in input as an ``~astropy.table.Table`` object.
The IRFs are assumed to be constant and not time-dependent. The output will be an ``~astropy.table.Table``
with the new columns ``RA``, ``DEC`` and ``ENERGY``, which are the reconstructed event energies and
positions.

Finally, the times and the energies/coordinates of the events will be merged into a
single ``~astropy.table.Table`` with the columns:``RA``, ``DEC`` and ``ENERGY`` and ``TIME`` .

The ``MapDatasetEventSampler`` can be used to sample background events using the
``Map.sample(n_events=, random_state=)`` as well. The time of the events is sampled
assuming a constant event rate. Finally, the IRF corrections are not applied to background
sampled events.


Performance and precision evaluation
====================================
To evaluate the precision and performance of the described framework we propose
to implement a prototype for a simulation / fitting pipeline. Starting from a
selection of spatial, spectral and temporal models, data are simulated and fitted
multiple times to derive distributions and pull-distributions of the reconstructed
model parameters. This pipeline should also monitor the required cpu and memory usage.
This first prototype can be used to evaluate the optimal bin-size (with the best
compromise between performance and precision) for the simulations and to verify
the over-all correctness of the simulated data. This will be valid for a set of
input maps and IRFs. Later this prototype can be developed further into a full
simulation / fitting continuous integration system.


Alternatives / Outlook
======================
So far Gammapy only supports binned likelihood analysis and technically most use-
cases for the event sampling could be solved with binned simulations. A binned
simulation can be basically achieved by a call to ``numpy.random.poisson()`` based
on the predicted number of counts map. This is conceptionally simpler as well as
computationally more efficient than a sampling of event lists. In ``Gammapy`` a similar
dataset simulation is already implemented in ``Dataset.fake()``, although this has a
limited number of use cases than an event sampler.
However, to support the full data access and data reduction process for simulations,
event lists are required. In future Gammapy possibly also supports event based analysis methods
(unbinned likelihood, but also e.g. clustering algorithms), that also require event
lists. For this reason binned simulations cannot present a full equivalent
solution to event sampling.

The question of the API to simulate multiple observations from e.g. an ``ObservationTable``
or a list of ``GTIs`` as it is needed for simulating data for the CTA data challenge
is not addressed in this PIG. For the scope of this PIG, the fundamental class
``MapDatasetEventSampler`` to simulate events corresponding to a given observation
and/or single GTI is in place.

The proposed Event Sampler will not provide, for each event, the corresponding
``DETX`` and ``DETY`` position. These will be added in a future development of the
simulator.


Task list
=========
This is a proposal for a list of tasks to implement the proposed changes:

1. Implement the ``sample`` method in ``gammapy.cube.MapDataset`` and add tests.
2. Implement the ``sample`` method in ``gammapy.time.models.LightCurveTableModel`` and ``gammapy.time.models.PhaseCurveTableModel`` and add tests.
3. Implement the ``sample`` method in ``gammapy.cube.PSFMap`` and add tests.
4. Implement the ``sample`` method in ``gammapy.cube.EdispMap`` and add tests.
5. Introduce the ``MapDatasetEventSampler`` into ``gammapy.cube.`` and add tests.
6. Add tutorials for event simulations of different kinds of sources.


Decision
========

The PIG was discussed extensively in `GH_2136`_, the weekly Gammapy developer calls
and coding sprint in person. After the deadline for final review expired on August 20,
all remaining comments were addressed and the PIG was accepted on August 30, 2019.



.. _``rejection sampling in Python``: https://wiseodd.github.io/techblog/2015/10/21/rejection-sampling/
.. _Prototype: https://github.com/fabiopintore/notebooks-public/blob/master/gammapy-event-sampling/prototype.ipynb
.. _inverseCDF: https://en.wikipedia.org/wiki/Inverse_transform_sampling
.. _here: https://stackoverflow.com/questions/21100716/fast-arbitrary-distribution-random-sampling/21101584#21101584
.. _gammalib: http://cta.irap.omp.eu/gammalib/doxygen/GModelSpatialRadialGauss_8cpp_source.html#l00325
.. _spec: https://docs.gammapy.org/0.11/api/gammapy.spectrum.SpectrumSimulation.html
.. _three-D: https://docs.gammapy.org/0.11/notebooks/simulate_3d.html
.. _astrisim: http://www.iasf-milano.inaf.it/~giuliani/astrisim/
.. _gamma-astro-data-formats: https://gamma-astro-data-formats.readthedocs.io/en/latest/events/events.html#mandatory-header-keywords
.. _rej_sampl: https://en.wikipedia.org/wiki/Rejection_sampling
.. _GH_2136: https://github.com/gammapy/gammapy/pull/2136
.. _GH_2229: https://github.com/gammapy/gammapy/pull/2229