Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIG 9 - Event sampling #2136

Merged
merged 51 commits into from Aug 30, 2019

Conversation

@fabiopintore
Copy link
Contributor

commented May 9, 2019

This PR addresses #761, which is related to an event sampler for gammapy.

@adonath adonath self-assigned this May 9, 2019
@adonath adonath added the pig label May 9, 2019
@adonath adonath added this to the 0.13 milestone May 9, 2019
@cdeil cdeil requested review from adonath, registerrier and cdeil May 13, 2019
@cdeil cdeil changed the title Event sampling pig PIG 9 - Event sampling May 13, 2019
Copy link
Member

left a comment

@fabiopintore @adonath - Thanks!

Some first superficial comments inline.

The basic method proposed here is to sample from binned distributions using inverse CDF, i.e. draw from the npred histogram. That's fine, at least as a first step, Gammapy at the moment is a binned analysis framework and probably other methods aren't within reach at the moment.

But there should be a section (either at the start in an "introduction" or under "alternatives") mentioning that the existing gamma-ray codes that have event samplers do it differently. I don't know much about how the Fermi ST and ctools work, but I do remember that in Gammalib one had to write an mc method for every new model to support event sampling. You can see here an example with an analytical way to sample and here an example where no analytical sampling method exists, and a rejection sampling method is used.

Please start by adding a section with a bit of information, and then we'll try to gather information how others do event sampling - e.g. I'm sure there are HEP papers as well where they describe how they sample events, e.g. UNURAN was added in ROOT for a reason.

Most likely we will for now stick with the method you propose here. Other methods are much more work to implement (requiring a "sample" method on every model), and there's the technical difficulty that with Numpy really only inverse CDF sampling can be done efficiently, because e.g. for rejection sampling (I think) you'd need a Python for loop which is super slow. But these things might come in Gammapy in the future, e.g. with Numba it's possible to implement quite easily.

My point is: this PIG should contain a bit more information why we chose the binned distribution sampling method, and why for now we didn't implement the well-established solution in gamma-ray astronomy, i.e. what Fermi ST and ctools do.

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
@fabiopintore fabiopintore force-pushed the fabiopintore:event_sampling_pig branch from 5124d4d to d211293 Jun 6, 2019
@cdeil

This comment has been minimized.

Copy link
Member

commented Jun 7, 2019

@adonath - FYI: We discussed this in the Gammapy dev call for 5 min today.

My recommendation would be that you finish writing the PIG and post it for review with a 1 week deadline first, or present and discuss it in the dev call next week, and then after that initial feedback circulate it widely for review.

I don't understand what happened in #2204 . The PR description says InverseCDFSampler is added, but I don't think that's what happened in the end and was merged!? Also, looking at the task list in this PIG, it's full of "?", so it's not clear to me what the next PRs will be.

@fabiopintore

This comment has been minimized.

Copy link
Contributor Author

commented Jun 7, 2019

@adonath adonath force-pushed the fabiopintore:event_sampling_pig branch 3 times, most recently from 362c5d9 to 97f9ae8 Jun 25, 2019
Copy link
Member

left a comment

Thanks a lot @fabiopintore! This is the last round if comments from my side. After you've implemented those the PIG is ready to circulate from my side.

propose to implement a framework for event simulation in Gammapy.

The proposed framework consists of individual building blocks, represented by
classes, that can be chained together to achieve a full simulation of an event

This comment has been minimized.

Copy link
@adonath

adonath Jul 24, 2019

Member

"by classes" -> "by classes and methods"

This comment has been minimized.

Copy link
@fabiopintore

fabiopintore Jul 24, 2019

Author Contributor

Done.

Inverse CDF sampling (inverseCDF_) is an established method to sample from discrete
probability mass functions. However it is not the method of choice for other existing
event samplers such as the the Fermi-LAT Science Tools (gtobsim) and CTOOLS (ctobssim).
The inverse CDF method is also successfully used by `ASTRIsim` (_astrisim), the event

This comment has been minimized.

Copy link
@adonath

adonath Jul 24, 2019

Member

Move this sentence after "...probability mass function." and shorten to "It is used by ASTRIsim..."

This comment has been minimized.

Copy link
@fabiopintore

fabiopintore Jul 24, 2019

Author Contributor

Done.


We propose to include in `gammapy.cube` an high level interface (HLI) class, labelled as
`MapDatasetEventSampler`. This class handles the complete event sampling process,
including the corrections related to the IRF and source temporal variability.

This comment has been minimized.

Copy link
@adonath

adonath Jul 24, 2019

Member

Add "...for a given GTI / observation."

This comment has been minimized.

Copy link
@fabiopintore

fabiopintore Jul 24, 2019

Author Contributor

Done.

uses a Poisson distribution, with mean equal to the predicted counts, to estimate the
random number of sampled events.

We propose to add a `sample` method in `~gamampy.cube.MapDataset` that will be the

This comment has been minimized.

Copy link
@adonath

adonath Jul 24, 2019

Member

I guess this should be Map.sample(n_events=, random_state=)...

This comment has been minimized.

Copy link
@fabiopintore

fabiopintore Jul 24, 2019

Author Contributor

Done.

class described in `PR#2229`_ . The output will be an `~astropy.table.Table` with columns:
`TRUE_RA`, `TRUE_DEC` and `TRUE_ENERGIES`.

Then, the time will be sampled independently. This will be done through a `sample`

This comment has been minimized.

Copy link
@adonath

adonath Jul 24, 2019

Member

Maybe write LightCurveTableModel.sample(n_events=, random_state=), to give some information one the function signature...

This comment has been minimized.

Copy link
@fabiopintore

fabiopintore Jul 24, 2019

Author Contributor

Done.

time of the events. In the case the temporal model is not provided, the time is uniformly
sampled in the time range Tstart-Tstop.

The IRF correction can now be applied to sampled events. We propose to add a `distribute`

This comment has been minimized.

Copy link
@adonath

adonath Jul 24, 2019

Member

Again add the signature of the function call: EdispMap.distribute(events=), where events is a Table object.

This comment has been minimized.

Copy link
@fabiopintore

fabiopintore Jul 24, 2019

Author Contributor

Done.

The method interpolates the "correct" IRF at the position of a given event and
applies it. In more detail the class will calculate the psf and the energy dispersion
for each event true position and true energy. The IRFs are assumed to be constant
and not time-dependent. The output will be an `Astropy.Table` with reconstructed event

This comment has been minimized.

Copy link
@adonath

adonath Jul 24, 2019

Member

Note that .distribute() requires the true positions as well as energies. So it takes a table object and adds the RA, DEC and ENERGY columns.

This comment has been minimized.

Copy link
@fabiopintore

fabiopintore Jul 24, 2019

Author Contributor

I've updated the text.

Finally, the times and the energies/coordinates of the events will be merged into a
unique `~astropy.table.Table` with the columns:`RA`, `DEC` and `ENERGY and `TIME` .

The `MapDatasetEventSampler` can be used to sample background events as well, although

This comment has been minimized.

Copy link
@adonath

adonath Jul 24, 2019

Member

Give a little bit more information: background events are drawn using Map.sample(n_events=, random_state=) and a constant event rate. The IRF distribution is not applied.

This comment has been minimized.

Copy link
@fabiopintore

fabiopintore Jul 24, 2019

Author Contributor

Done.


::

for src in dataset[source1, source2, ...sourceN]:

This comment has been minimized.

Copy link
@adonath

adonath Jul 24, 2019

Member

I think the code should rather be as following:

def sample(dataset, random_state)
    """Sample events from a `MapDataset`"""

    events_list = []
    
    for evaluator in dataset.evaluators:
        npred = evaluator.compute_npred()
        n_events = random_state.poisson(npred.data.sum())
        events = npred.sample(n_events, random_state)
        events_list.append(events)
    
    events_src = vstack(events_list)
    events_src = dataset.psf.distribute(events_src, random_state)
    events_src = dataset.edisp.distribute(events_src, random_state)
    
    n_events_bkg = random_state.poisson(dataset.background_model.map.data.sum())
    events_bkg = dataset.background_model.sample(n_events, random_state)
    
    events_total = vstack([events_src, events_bkg])
    events_total.meta = get_events_meta_data(dataset)
    return EventList(events_total)

This comment has been minimized.

Copy link
@fabiopintore

fabiopintore Jul 24, 2019

Author Contributor

okay, I've modified the code.

files decribed on `gamma-astro-data-formats`_. The `MapDatasetSampler` will be designed to work
alternatively for input flux or counts maps.

The basic code structure is the following:

This comment has been minimized.

Copy link
@adonath

adonath Jul 24, 2019

Member

With the code example above, this could go...

This comment has been minimized.

Copy link
@fabiopintore

fabiopintore Jul 24, 2019

Author Contributor

Done.

Copy link
Member

left a comment

@fabiopintore @adonath - Thank you for all your work. This should be ready to circulate very soon.

Please see my inline comments.


As a first step, the User should create a `MapDataset` object, as described below:

# 20 lines of code with the `MapDataset` object setup.

This comment has been minimized.

Copy link
@cdeil

cdeil Jul 26, 2019

Member

@fabiopintore - What's this about 20 lines of code? Just remove this comment? Or did you want to add something?

* Created: May 03, 2019
* Accepted:
* Status: Draft
* Discussion: `PR 2136`_

This comment has been minimized.

Copy link
@cdeil

cdeil Jul 26, 2019

Member

Please, for consistency, use "GH" instead of "PR", throughout this PIG.

Example: https://raw.githubusercontent.com/gammapy/gammapy/master/docs/development/pigs/pig-001.rst

We propose to add a `Map.sample(n_events=, random_state=)` method in `~gamampy.cube.MapDataset`
that will be the core of the sampling process. The `sample` is based on the
`~gammapy.utils.random.InverseCDFSampler` class described in `PR#2229`_ . The output
will be an `~astropy.table.Table` with columns: `RA_TRUE`, `DEC_TRUE` and `ENERGIES_TRUE`.

This comment has been minimized.

Copy link
@cdeil

cdeil Jul 26, 2019

Member

ENERGIES_TRUE -> ENERGY_TRUE


We propose to add a `Map.sample(n_events=, random_state=)` method in `~gamampy.cube.MapDataset`
that will be the core of the sampling process. The `sample` is based on the
`~gammapy.utils.random.InverseCDFSampler` class described in `PR#2229`_ . The output

This comment has been minimized.

Copy link
@cdeil

cdeil Jul 26, 2019

Member

Please here and everywhere when referencing classes, just write this:

``InverseCDFSampler``

The single-backticks and ~ are Sphinx-syntax to create links to API docs. But if the API changes in the coming years, we don't want to have to go back and update PIGs. So from PIGs we don't create these links.

MapDatasetEventSampler
----------------------

The general design of the `sample` method is as follows:

This comment has been minimized.

Copy link
@cdeil

cdeil Jul 26, 2019

Member

@adonath @fabiopintore - I thought you wanted to get MC_ID. This sample method doesn't loop over source components, you will not have SOURCE_ID or MC_ID.

So either show how it roughly works, or remove the code example?

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
events_list.append(events)
events_src = vstack(events_list)
events_src = dataset.psf.distribute(events_src, random_state)

This comment has been minimized.

Copy link
@cdeil

cdeil Jul 26, 2019

Member

Suggest to name this method also "sample", not "distribute".
It samples the EDISP or PSF distribution functions, it is a kind of sampling.
I just think if all sampling-related code has the same name it's easier to grok and find.

This comment has been minimized.

Copy link
@adonath

This comment has been minimized.

Copy link
@adonath
Copy link
Contributor

left a comment

Thanks @fabiopintore ! This looks good to me.
I have added a number of inline comments for typos and a few clarifications connected to:

  • how/when do you use exposure information?
  • how are the LightCurveTableModelgenerated times associated to the relevant simulated events?
docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved

An event sampler of gamma events is of high importance in exploiting the potentiality
of the future Cherenkov Telescope Array (CTA). It will allows us to simulate sources
with different spectral and morphological properties adn e.g. investigating the best

This comment has been minimized.

Copy link
@registerrier

registerrier Jul 26, 2019

Contributor

and .e.g. to investigate

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
An event sampler for gamma events is an important part of the science tools
of the future Cherenkov Telescope Array (CTA) observatory. It will allow users
to simulate sources with different spectral, morphological and temporal properties
and predict the performance of CTA on the simulated data e.g. to support observation

This comment has been minimized.

Copy link
@registerrier

registerrier Jul 26, 2019

Contributor

on the simulated events

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
`~gammapy.utils.random.InverseCDFSampler` class described in `PR#2229`_ . The output
will be an `~astropy.table.Table` with columns: `RA_TRUE`, `DEC_TRUE` and `ENERGIES_TRUE`.

Then, the time will be sampled independently. This will be done through a

This comment has been minimized.

Copy link
@registerrier

registerrier Jul 26, 2019

Contributor

Since the light curve model is connected to a specific source in the sky model, you might say a word on how this will be done in practice?


The `MapDatasetEventSampler` can be used to sample background events using the
`Map.sample(n_events=, random_state=)` as well. The time of the events is sampled
assuming a costant event rate. Finally, the IRF corrections are not applied to sampled

This comment has been minimized.

Copy link
@registerrier

registerrier Jul 26, 2019

Contributor

constant

====================================
To evaluate the precision and performance of the described framework we propose
to implement a prototype for a simulation / fitting pipeline. Starting from a
selection of spatial, spectral and temporal models, data is simulated and fitted

This comment has been minimized.

Copy link
@registerrier

registerrier Jul 26, 2019

Contributor

data are simulated

selection of spatial, spectral and temporal models, data is simulated and fitted
multiple times to derive distributions and pull-distributions of the reconstructed
model parameters. This pipeline should also monitor the required cpu and memory usage.
This first prototype can be used to evaluate the optimal bin-size (with the best

This comment has been minimized.

Copy link
@registerrier

registerrier Jul 26, 2019

Contributor

the bin size will be valid for a set or IRFs right?

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
Copy link
Contributor

left a comment

Hi!

I'm a bit puzzled by your code example: first, you compute npred for each of the source components, draw a Poisson sample from that, and then sample the individual events. Then, after having stacked all the events, you somehow apply the PSF and EDISP. But usually the MapDataset will be defined with a PSF and EDISP already, and so npred is already convolved with the IRFs.

I also don't quite understand why you need to compute npred at all. Can't you sample the events from the models directly, and then apply PSF and EDISP to go from true to reconstructed quantities? My feeling is that going the detour via npred will also introduce a lot of statistical noise.

The rest of the proposal looks OK to me, but this is a major point that should be clarified or changed.

Best,
Lars

@cdeil

This comment has been minimized.

Copy link
Member

commented Aug 2, 2019

@lmohrmann

I think for time models the idea is to sample directly, that should be clarified.

For spatial models, it would be possible to sample directly, but would require analytical "sample" methods on each model. Possible, but not what's proposed here.

The most difficult part is to know how many events to sample and the energy distribution. There one has to use npred = flux * exposure, where exposure = effective_area * obs_time. That's the proposal here, to use this kind of npred, which doesn't have PSF or EDISP applied.

An alternative would be to sample the flux distributions, and then to reject sampled events based on exposure. But that's difficult to do efficiently, e.g. with a power-law spectrum all the flux is at low energies but there the exposure is almost zero.

@lmohrmann - OK to stick with the method to sample npred = flux * exposure and purely npred map-based sampling?

@adonath @fabiopintore - Please clarify the method used (and clarify or remove the code example). If I see correctly, currently the words "exposure" or "effective area" never appear in the PIG -- make it clearer what npred is.

@lmohrmann

This comment has been minimized.

Copy link
Contributor

commented Aug 2, 2019

@cdeil @fabiopintore

I see the advantage of using the existing code to calculate some npred and then sample from that rather than adding sample methods on each model. But it does not become clear from the PIG that npred is supposed to be just flux * exposure. So it should be stated clearly in the PIG that that is the case and npred does not involve any folding with psf or edisp. Usually psf and edisp are considered when calculating npred.

fabiopintore added 4 commits Aug 6, 2019
@fabiopintore

This comment has been minimized.

Copy link
Contributor Author

commented Aug 6, 2019

@lmohrmann @cdeil
As correctly answered by @cdeil , npred is calculated without invoking PSF or Edisp.
I've included some sentences to describe better npred.

fabiopintore and others added 4 commits Aug 30, 2019
@adonath adonath force-pushed the fabiopintore:event_sampling_pig branch from ad96adc to 895a74e Aug 30, 2019
@adonath

This comment has been minimized.

Copy link
Member

commented Aug 30, 2019

The review period for this PIG expired August 20. All remaining comment were addressed.
I changed the PIG status and filled the decision section. I'll go ahead and merge this PR now.

Thank you @fabiopintore for the work on this PIG and thanks @cdeil, @registerrier and @lmohrmann for the feedback!

@adonath adonath merged commit 35384fc into gammapy:master Aug 30, 2019
7 of 9 checks passed
7 of 9 checks passed
Codacy/PR Quality Review Hang in there, Codacy is reviewing your Pull request.
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
Scrutinizer Analysis: No new issues – Tests: passed
Details
gammapy.gammapy Build #20190830.13 succeeded
Details
gammapy.gammapy (DevDocs) DevDocs succeeded
Details
gammapy.gammapy (Lint) Lint succeeded
Details
gammapy.gammapy (Test Python36) Test Python36 succeeded
Details
gammapy.gammapy (Test Windows36) Test Windows36 succeeded
Details
gammapy.gammapy (Test Windows37) Test Windows37 succeeded
Details
@cdeil

This comment has been minimized.

Copy link
Member

commented Aug 30, 2019

🎉

@adonath - Can you please add this RST file to the PIG index.rst toctree, so that is shows up in the docs? I usually do that in the PR right before merging and check the HTML. I think this underline is too long and will give a Sphinx warning?
https://github.com/gammapy/gammapy/pull/2136/files#diff-c6913f0944e70c096408aef9be140befR38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.