Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIG 9 - Event sampling #2136

Merged
merged 51 commits into from Aug 30, 2019
Merged

Conversation

fabiopintore
Copy link
Contributor

@fabiopintore fabiopintore commented May 9, 2019

This PR addresses #761, which is related to an event sampler for gammapy.

@adonath adonath self-assigned this May 9, 2019
@adonath adonath added the pig label May 9, 2019
@adonath adonath added this to the 0.13 milestone May 9, 2019
@cdeil cdeil changed the title Event sampling pig PIG 9 - Event sampling May 13, 2019
Copy link
Contributor

@cdeil cdeil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiopintore @adonath - Thanks!

Some first superficial comments inline.

The basic method proposed here is to sample from binned distributions using inverse CDF, i.e. draw from the npred histogram. That's fine, at least as a first step, Gammapy at the moment is a binned analysis framework and probably other methods aren't within reach at the moment.

But there should be a section (either at the start in an "introduction" or under "alternatives") mentioning that the existing gamma-ray codes that have event samplers do it differently. I don't know much about how the Fermi ST and ctools work, but I do remember that in Gammalib one had to write an mc method for every new model to support event sampling. You can see here an example with an analytical way to sample and here an example where no analytical sampling method exists, and a rejection sampling method is used.

Please start by adding a section with a bit of information, and then we'll try to gather information how others do event sampling - e.g. I'm sure there are HEP papers as well where they describe how they sample events, e.g. UNURAN was added in ROOT for a reason.

Most likely we will for now stick with the method you propose here. Other methods are much more work to implement (requiring a "sample" method on every model), and there's the technical difficulty that with Numpy really only inverse CDF sampling can be done efficiently, because e.g. for rejection sampling (I think) you'd need a Python for loop which is super slow. But these things might come in Gammapy in the future, e.g. with Numba it's possible to implement quite easily.

My point is: this PIG should contain a bit more information why we chose the binned distribution sampling method, and why for now we didn't implement the well-established solution in gamma-ray astronomy, i.e. what Fermi ST and ctools do.

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
@cdeil
Copy link
Contributor

cdeil commented Jun 7, 2019

@adonath - FYI: We discussed this in the Gammapy dev call for 5 min today.

My recommendation would be that you finish writing the PIG and post it for review with a 1 week deadline first, or present and discuss it in the dev call next week, and then after that initial feedback circulate it widely for review.

I don't understand what happened in #2204 . The PR description says InverseCDFSampler is added, but I don't think that's what happened in the end and was merged!? Also, looking at the task list in this PIG, it's full of "?", so it's not clear to me what the next PRs will be.

@fabiopintore
Copy link
Contributor Author

fabiopintore commented Jun 7, 2019 via email

@adonath adonath force-pushed the event_sampling_pig branch 3 times, most recently from 362c5d9 to 97f9ae8 Compare June 28, 2019 11:55
Copy link
Member

@adonath adonath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @fabiopintore! This is the last round if comments from my side. After you've implemented those the PIG is ready to circulate from my side.

propose to implement a framework for event simulation in Gammapy.

The proposed framework consists of individual building blocks, represented by
classes, that can be chained together to achieve a full simulation of an event
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"by classes" -> "by classes and methods"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Inverse CDF sampling (inverseCDF_) is an established method to sample from discrete
probability mass functions. However it is not the method of choice for other existing
event samplers such as the the Fermi-LAT Science Tools (gtobsim) and CTOOLS (ctobssim).
The inverse CDF method is also successfully used by `ASTRIsim` (_astrisim), the event
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this sentence after "...probability mass function." and shorten to "It is used by ASTRIsim..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


We propose to include in `gammapy.cube` an high level interface (HLI) class, labelled as
`MapDatasetEventSampler`. This class handles the complete event sampling process,
including the corrections related to the IRF and source temporal variability.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "...for a given GTI / observation."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

uses a Poisson distribution, with mean equal to the predicted counts, to estimate the
random number of sampled events.

We propose to add a `sample` method in `~gamampy.cube.MapDataset` that will be the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this should be Map.sample(n_events=, random_state=)...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

class described in `PR#2229`_ . The output will be an `~astropy.table.Table` with columns:
`TRUE_RA`, `TRUE_DEC` and `TRUE_ENERGIES`.

Then, the time will be sampled independently. This will be done through a `sample`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe write LightCurveTableModel.sample(n_events=, random_state=), to give some information one the function signature...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

time of the events. In the case the temporal model is not provided, the time is uniformly
sampled in the time range Tstart-Tstop.

The IRF correction can now be applied to sampled events. We propose to add a `distribute`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again add the signature of the function call: EdispMap.distribute(events=), where events is a Table object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

The method interpolates the "correct" IRF at the position of a given event and
applies it. In more detail the class will calculate the psf and the energy dispersion
for each event true position and true energy. The IRFs are assumed to be constant
and not time-dependent. The output will be an `Astropy.Table` with reconstructed event
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that .distribute() requires the true positions as well as energies. So it takes a table object and adds the RA, DEC and ENERGY columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the text.

Finally, the times and the energies/coordinates of the events will be merged into a
unique `~astropy.table.Table` with the columns:`RA`, `DEC` and `ENERGY and `TIME` .

The `MapDatasetEventSampler` can be used to sample background events as well, although
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give a little bit more information: background events are drawn using Map.sample(n_events=, random_state=) and a constant event rate. The IRF distribution is not applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


::

for src in dataset[source1, source2, ...sourceN]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code should rather be as following:

def sample(dataset, random_state)
    """Sample events from a `MapDataset`"""

    events_list = []
    
    for evaluator in dataset.evaluators:
        npred = evaluator.compute_npred()
        n_events = random_state.poisson(npred.data.sum())
        events = npred.sample(n_events, random_state)
        events_list.append(events)
    
    events_src = vstack(events_list)
    events_src = dataset.psf.distribute(events_src, random_state)
    events_src = dataset.edisp.distribute(events_src, random_state)
    
    n_events_bkg = random_state.poisson(dataset.background_model.map.data.sum())
    events_bkg = dataset.background_model.sample(n_events, random_state)
    
    events_total = vstack([events_src, events_bkg])
    events_total.meta = get_events_meta_data(dataset)
    return EventList(events_total)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I've modified the code.

files decribed on `gamma-astro-data-formats`_. The `MapDatasetSampler` will be designed to work
alternatively for input flux or counts maps.

The basic code structure is the following:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the code example above, this could go...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@cdeil cdeil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiopintore @adonath - Thank you for all your work. This should be ready to circulate very soon.

Please see my inline comments.


As a first step, the User should create a `MapDataset` object, as described below:

# 20 lines of code with the `MapDataset` object setup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiopintore - What's this about 20 lines of code? Just remove this comment? Or did you want to add something?

* Created: May 03, 2019
* Accepted:
* Status: Draft
* Discussion: `PR 2136`_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, for consistency, use "GH" instead of "PR", throughout this PIG.

Example: https://raw.githubusercontent.com/gammapy/gammapy/master/docs/development/pigs/pig-001.rst

We propose to add a `Map.sample(n_events=, random_state=)` method in `~gamampy.cube.MapDataset`
that will be the core of the sampling process. The `sample` is based on the
`~gammapy.utils.random.InverseCDFSampler` class described in `PR#2229`_ . The output
will be an `~astropy.table.Table` with columns: `RA_TRUE`, `DEC_TRUE` and `ENERGIES_TRUE`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENERGIES_TRUE -> ENERGY_TRUE


We propose to add a `Map.sample(n_events=, random_state=)` method in `~gamampy.cube.MapDataset`
that will be the core of the sampling process. The `sample` is based on the
`~gammapy.utils.random.InverseCDFSampler` class described in `PR#2229`_ . The output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please here and everywhere when referencing classes, just write this:

``InverseCDFSampler``

The single-backticks and ~ are Sphinx-syntax to create links to API docs. But if the API changes in the coming years, we don't want to have to go back and update PIGs. So from PIGs we don't create these links.

MapDatasetEventSampler
----------------------

The general design of the `sample` method is as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adonath @fabiopintore - I thought you wanted to get MC_ID. This sample method doesn't loop over source components, you will not have SOURCE_ID or MC_ID.

So either show how it roughly works, or remove the code example?

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
events_list.append(events)

events_src = vstack(events_list)
events_src = dataset.psf.distribute(events_src, random_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to name this method also "sample", not "distribute".
It samples the EDISP or PSF distribution functions, it is a kind of sampling.
I just think if all sampling-related code has the same name it's easier to grok and find.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

registerrier
registerrier previously approved these changes Jul 26, 2019
Copy link
Contributor

@registerrier registerrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fabiopintore ! This looks good to me.
I have added a number of inline comments for typos and a few clarifications connected to:

  • how/when do you use exposure information?
  • how are the LightCurveTableModelgenerated times associated to the relevant simulated events?

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved

An event sampler of gamma events is of high importance in exploiting the potentiality
of the future Cherenkov Telescope Array (CTA). It will allows us to simulate sources
with different spectral and morphological properties adn e.g. investigating the best
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and .e.g. to investigate

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
An event sampler for gamma events is an important part of the science tools
of the future Cherenkov Telescope Array (CTA) observatory. It will allow users
to simulate sources with different spectral, morphological and temporal properties
and predict the performance of CTA on the simulated data e.g. to support observation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the simulated events

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
`~gammapy.utils.random.InverseCDFSampler` class described in `PR#2229`_ . The output
will be an `~astropy.table.Table` with columns: `RA_TRUE`, `DEC_TRUE` and `ENERGIES_TRUE`.

Then, the time will be sampled independently. This will be done through a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the light curve model is connected to a specific source in the sky model, you might say a word on how this will be done in practice?


The `MapDatasetEventSampler` can be used to sample background events using the
`Map.sample(n_events=, random_state=)` as well. The time of the events is sampled
assuming a costant event rate. Finally, the IRF corrections are not applied to sampled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constant

====================================
To evaluate the precision and performance of the described framework we propose
to implement a prototype for a simulation / fitting pipeline. Starting from a
selection of spatial, spectral and temporal models, data is simulated and fitted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data are simulated

selection of spatial, spectral and temporal models, data is simulated and fitted
multiple times to derive distributions and pull-distributions of the reconstructed
model parameters. This pipeline should also monitor the required cpu and memory usage.
This first prototype can be used to evaluate the optimal bin-size (with the best
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bin size will be valid for a set or IRFs right?

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@lmohrmann lmohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

I'm a bit puzzled by your code example: first, you compute npred for each of the source components, draw a Poisson sample from that, and then sample the individual events. Then, after having stacked all the events, you somehow apply the PSF and EDISP. But usually the MapDataset will be defined with a PSF and EDISP already, and so npred is already convolved with the IRFs.

I also don't quite understand why you need to compute npred at all. Can't you sample the events from the models directly, and then apply PSF and EDISP to go from true to reconstructed quantities? My feeling is that going the detour via npred will also introduce a lot of statistical noise.

The rest of the proposal looks OK to me, but this is a major point that should be clarified or changed.

Best,
Lars

@cdeil
Copy link
Contributor

cdeil commented Aug 2, 2019

@lmohrmann

I think for time models the idea is to sample directly, that should be clarified.

For spatial models, it would be possible to sample directly, but would require analytical "sample" methods on each model. Possible, but not what's proposed here.

The most difficult part is to know how many events to sample and the energy distribution. There one has to use npred = flux * exposure, where exposure = effective_area * obs_time. That's the proposal here, to use this kind of npred, which doesn't have PSF or EDISP applied.

An alternative would be to sample the flux distributions, and then to reject sampled events based on exposure. But that's difficult to do efficiently, e.g. with a power-law spectrum all the flux is at low energies but there the exposure is almost zero.

@lmohrmann - OK to stick with the method to sample npred = flux * exposure and purely npred map-based sampling?

@adonath @fabiopintore - Please clarify the method used (and clarify or remove the code example). If I see correctly, currently the words "exposure" or "effective area" never appear in the PIG -- make it clearer what npred is.

@lmohrmann
Copy link
Contributor

@cdeil @fabiopintore

I see the advantage of using the existing code to calculate some npred and then sample from that rather than adding sample methods on each model. But it does not become clear from the PIG that npred is supposed to be just flux * exposure. So it should be stated clearly in the PIG that that is the case and npred does not involve any folding with psf or edisp. Usually psf and edisp are considered when calculating npred.

@fabiopintore
Copy link
Contributor Author

@lmohrmann @cdeil
As correctly answered by @cdeil , npred is calculated without invoking PSF or Edisp.
I've included some sentences to describe better npred.

@adonath
Copy link
Member

adonath commented Aug 30, 2019

The review period for this PIG expired August 20. All remaining comment were addressed.
I changed the PIG status and filled the decision section. I'll go ahead and merge this PR now.

Thank you @fabiopintore for the work on this PIG and thanks @cdeil, @registerrier and @lmohrmann for the feedback!

@adonath adonath merged commit 35384fc into gammapy:master Aug 30, 2019
@cdeil
Copy link
Contributor

cdeil commented Aug 30, 2019

🎉

@adonath - Can you please add this RST file to the PIG index.rst toctree, so that is shows up in the docs? I usually do that in the PR right before merging and check the HTML. I think this underline is too long and will give a Sphinx warning?
https://github.com/gammapy/gammapy/pull/2136/files#diff-c6913f0944e70c096408aef9be140befR38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants