PIG 9 - Event sampling #2136

fabiopintore · 2019-05-09T19:58:10Z

This PR addresses #761, which is related to an event sampler for gammapy.

cdeil

Some first superficial comments inline.

The basic method proposed here is to sample from binned distributions using inverse CDF, i.e. draw from the npred histogram. That's fine, at least as a first step, Gammapy at the moment is a binned analysis framework and probably other methods aren't within reach at the moment.

But there should be a section (either at the start in an "introduction" or under "alternatives") mentioning that the existing gamma-ray codes that have event samplers do it differently. I don't know much about how the Fermi ST and ctools work, but I do remember that in Gammalib one had to write an mc method for every new model to support event sampling. You can see here an example with an analytical way to sample and here an example where no analytical sampling method exists, and a rejection sampling method is used.

Please start by adding a section with a bit of information, and then we'll try to gather information how others do event sampling - e.g. I'm sure there are HEP papers as well where they describe how they sample events, e.g. UNURAN was added in ROOT for a reason.

Most likely we will for now stick with the method you propose here. Other methods are much more work to implement (requiring a "sample" method on every model), and there's the technical difficulty that with Numpy really only inverse CDF sampling can be done efficiently, because e.g. for rejection sampling (I think) you'd need a Python for loop which is super slow. But these things might come in Gammapy in the future, e.g. with Numba it's possible to implement quite easily.

My point is: this PIG should contain a bit more information why we chose the binned distribution sampling method, and why for now we didn't implement the well-established solution in gamma-ray astronomy, i.e. what Fermi ST and ctools do.

docs/development/pigs/pig-009.rst

cdeil · 2019-06-07T09:13:44Z

@adonath - FYI: We discussed this in the Gammapy dev call for 5 min today.

My recommendation would be that you finish writing the PIG and post it for review with a 1 week deadline first, or present and discuss it in the dev call next week, and then after that initial feedback circulate it widely for review.

I don't understand what happened in #2204 . The PR description says InverseCDFSampler is added, but I don't think that's what happened in the end and was merged!? Also, looking at the task list in this PIG, it's full of "?", so it's not clear to me what the next PRs will be.

fabiopintore · 2019-06-07T12:13:23Z

Hi Chris, I'm traveling in this moment, hence sorry if i give you only a quick explanation. In practice, I opened the PR #2204 <#2204> and I actually added the InverseCDF class. However, since it was the first time I was opening a PR, yesterday Axel told me that it's better to split the PR in smaller steps. For this reason, we considered this PR only up to: "This PR introduce a renaming of the gammapy.utils.distributions' into gammapy.utils.random'.

The content of the current `gammapy.utils.random' is moved in the new folder. Then the currently unused GeneralRandomArray class is removed from gammapy.utils.random' ..."

and we didn't included the InverseCDF class yet. I'm going to open, as quick as possible, a new pull requests for the InverseCDF alone. Sorry for the confusion. Also, looking at the task list in this PIG, it's full of "?", so it's not

clear to me what the next PRs will be.

We have to still fix some issues in the PIG. However, after the PR for the InverseCDF class, the next one would have been the pull request #4

"4. Remove the current `GeneralRandom` class and adapt `gammapy.astro.population.simulate' accordingly."

However, about this last point I'll discuss with Axel your suggestion on how to face with it. If you have any question, I'll try to reply as soon as possible. Cheers, F. Il giorno ven 7 giu 2019 alle ore 11:13 Christoph Deil < notifications@github.com> ha scritto:

…

@adonath <https://github.com/adonath> - FYI: We discussed this in the Gammapy dev call for 5 min today. My recommendation would be that you finish writing the PIG and post it for review with a 1 week deadline first, or present and discuss it in the dev call next week, and then after that initial feedback circulate it widely for review. I don't understand what happened in #2204 <#2204> . The PR description says InverseCDFSampler is added, but I don't think that's what happened in the end and was merged!? Also, looking at the task list in this PIG, it's full of "?", so it's not clear to me what the next PRs will be. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2136?email_source=notifications&email_token=AJMDXANIM6MMLIDYAFNWLL3PZIREVA5CNFSM4HL5RIF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXFJJ7Q#issuecomment-499815678>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJMDXAKQL6FKRZO33XEYCILPZIREVANCNFSM4HL5RIFQ> .

adonath

Thanks a lot @fabiopintore! This is the last round if comments from my side. After you've implemented those the PIG is ready to circulate from my side.

adonath · 2019-07-24T11:23:42Z

docs/development/pigs/pig-009.rst

+propose to implement a framework for event simulation in Gammapy.
+
+The proposed framework consists of individual building blocks, represented by
+classes, that can be chained together to achieve a full simulation of an event


"by classes" -> "by classes and methods"

adonath · 2019-07-24T11:26:26Z

docs/development/pigs/pig-009.rst

+Inverse CDF sampling (inverseCDF_) is an established method to sample from discrete
+probability mass functions. However it is not the method of choice for other existing
+event samplers such as the the Fermi-LAT Science Tools (gtobsim) and CTOOLS (ctobssim).
+The inverse CDF method is also successfully used by `ASTRIsim` (_astrisim), the event


Move this sentence after "...probability mass function." and shorten to "It is used by ASTRIsim..."

adonath · 2019-07-24T11:28:13Z

docs/development/pigs/pig-009.rst

+
+We propose to include in `gammapy.cube` an high level interface (HLI) class, labelled as
+`MapDatasetEventSampler`. This class handles the complete event sampling process, 
+including the corrections related to the IRF and source temporal variability.


Add "...for a given GTI / observation."

adonath · 2019-07-24T11:34:19Z

docs/development/pigs/pig-009.rst

+uses a Poisson distribution, with mean equal to the predicted counts, to estimate the
+random number of sampled events.
+
+We propose to add a `sample` method in `~gamampy.cube.MapDataset` that will be the


I guess this should be Map.sample(n_events=, random_state=)...

adonath · 2019-07-24T11:35:49Z

docs/development/pigs/pig-009.rst

+class described in `PR#2229`_ . The output will be an `~astropy.table.Table` with columns: 
+`TRUE_RA`, `TRUE_DEC` and `TRUE_ENERGIES`.
+
+Then, the time will be sampled independently. This will be done through a `sample` 


Maybe write LightCurveTableModel.sample(n_events=, random_state=), to give some information one the function signature...

adonath · 2019-07-24T11:38:31Z

docs/development/pigs/pig-009.rst

+time of the events. In the case the temporal model is not provided, the time is uniformly 
+sampled in the time range Tstart-Tstop.
+
+The IRF correction can now be applied to sampled events. We propose to add a `distribute`


Again add the signature of the function call: EdispMap.distribute(events=), where events is a Table object.

adonath · 2019-07-24T11:39:30Z

docs/development/pigs/pig-009.rst

+The method interpolates the "correct" IRF at the position of a given event and
+applies it. In more detail the class will calculate the psf and the energy dispersion 
+for each event true position and true energy. The IRFs are assumed to be constant
+and not time-dependent. The output will be an `Astropy.Table` with reconstructed event 


Note that .distribute() requires the true positions as well as energies. So it takes a table object and adds the RA, DEC and ENERGY columns.

I've updated the text.

adonath · 2019-07-24T11:41:20Z

docs/development/pigs/pig-009.rst

+Finally, the times and the energies/coordinates of the events will be merged into a 
+unique `~astropy.table.Table` with the columns:`RA`, `DEC` and `ENERGY and `TIME` .
+
+The `MapDatasetEventSampler` can be used to sample background events as well, although 


Give a little bit more information: background events are drawn using Map.sample(n_events=, random_state=) and a constant event rate. The IRF distribution is not applied.

adonath · 2019-07-24T11:55:37Z

docs/development/pigs/pig-009.rst

+
+::
+
+    for src in dataset[source1, source2, ...sourceN]:


I think the code should rather be as following:

def sample(dataset, random_state) """Sample events from a `MapDataset`""" events_list = [] for evaluator in dataset.evaluators: npred = evaluator.compute_npred() n_events = random_state.poisson(npred.data.sum()) events = npred.sample(n_events, random_state) events_list.append(events) events_src = vstack(events_list) events_src = dataset.psf.distribute(events_src, random_state) events_src = dataset.edisp.distribute(events_src, random_state) n_events_bkg = random_state.poisson(dataset.background_model.map.data.sum()) events_bkg = dataset.background_model.sample(n_events, random_state) events_total = vstack([events_src, events_bkg]) events_total.meta = get_events_meta_data(dataset) return EventList(events_total)

okay, I've modified the code.

adonath · 2019-07-24T12:17:26Z

docs/development/pigs/pig-009.rst

+files decribed on `gamma-astro-data-formats`_. The `MapDatasetSampler` will be designed to work 
+alternatively for input flux or counts maps.
+
+The basic code structure is the following:


With the code example above, this could go...

cdeil

@fabiopintore @adonath - Thank you for all your work. This should be ready to circulate very soon.

Please see my inline comments.

cdeil · 2019-07-26T09:05:35Z

docs/development/pigs/pig-009.rst

+
+As a first step, the User should create a `MapDataset` object, as described below:
+
+# 20 lines of code with the `MapDataset` object setup.


@fabiopintore - What's this about 20 lines of code? Just remove this comment? Or did you want to add something?

cdeil · 2019-07-26T09:06:39Z

docs/development/pigs/pig-009.rst

+* Created: May 03, 2019
+* Accepted: 
+* Status: Draft
+* Discussion: `PR 2136`_


Please, for consistency, use "GH" instead of "PR", throughout this PIG.

Example: https://raw.githubusercontent.com/gammapy/gammapy/master/docs/development/pigs/pig-001.rst

cdeil · 2019-07-26T09:08:33Z

docs/development/pigs/pig-009.rst

+We propose to add a `Map.sample(n_events=, random_state=)` method in `~gamampy.cube.MapDataset` 
+that will be the core of the sampling process. The `sample` is based on the 
+`~gammapy.utils.random.InverseCDFSampler` class described in `PR#2229`_ . The output 
+will be an `~astropy.table.Table` with columns: `RA_TRUE`, `DEC_TRUE` and `ENERGIES_TRUE`.


ENERGIES_TRUE -> ENERGY_TRUE

cdeil · 2019-07-26T09:09:56Z

docs/development/pigs/pig-009.rst

+
+We propose to add a `Map.sample(n_events=, random_state=)` method in `~gamampy.cube.MapDataset` 
+that will be the core of the sampling process. The `sample` is based on the 
+`~gammapy.utils.random.InverseCDFSampler` class described in `PR#2229`_ . The output 


Please here and everywhere when referencing classes, just write this:

``InverseCDFSampler``

The single-backticks and ~ are Sphinx-syntax to create links to API docs. But if the API changes in the coming years, we don't want to have to go back and update PIGs. So from PIGs we don't create these links.

cdeil · 2019-07-26T09:11:29Z

docs/development/pigs/pig-009.rst

+MapDatasetEventSampler
+----------------------
+
+The general design of the `sample` method is as follows:


@adonath @fabiopintore - I thought you wanted to get MC_ID. This sample method doesn't loop over source components, you will not have SOURCE_ID or MC_ID.

So either show how it roughly works, or remove the code example?

docs/development/pigs/pig-009.rst

cdeil · 2019-07-26T09:14:51Z

docs/development/pigs/pig-009.rst

+            events_list.append(events)
+
+       events_src = vstack(events_list)
+       events_src = dataset.psf.distribute(events_src, random_state)


Suggest to name this method also "sample", not "distribute".
It samples the EDISP or PSF distribution functions, it is a kind of sampling.
I just think if all sampling-related code has the same name it's easier to grok and find.

registerrier

Thanks @fabiopintore ! This looks good to me.
I have added a number of inline comments for typos and a few clarifications connected to:

how/when do you use exposure information?
how are the LightCurveTableModelgenerated times associated to the relevant simulated events?

docs/development/pigs/pig-009.rst

registerrier · 2019-06-14T08:06:25Z

docs/development/pigs/pig-009.rst

+
+An event sampler of gamma events is of high importance in exploiting the potentiality 
+of the future Cherenkov Telescope Array (CTA). It will allows us to simulate sources 
+with different spectral and morphological properties adn e.g. investigating the best 


and .e.g. to investigate

docs/development/pigs/pig-009.rst

registerrier · 2019-07-26T08:27:03Z

docs/development/pigs/pig-009.rst

+An event sampler for gamma events is an important part of the science tools
+of the future Cherenkov Telescope Array (CTA) observatory. It will allow users
+to simulate sources with different spectral, morphological and temporal properties
+and predict the performance of CTA on the simulated data e.g. to support observation


on the simulated events

docs/development/pigs/pig-009.rst

registerrier · 2019-07-26T09:28:50Z

docs/development/pigs/pig-009.rst

+`~gammapy.utils.random.InverseCDFSampler` class described in `PR#2229`_ . The output 
+will be an `~astropy.table.Table` with columns: `RA_TRUE`, `DEC_TRUE` and `ENERGIES_TRUE`.
+
+Then, the time will be sampled independently. This will be done through a 


Since the light curve model is connected to a specific source in the sky model, you might say a word on how this will be done in practice?

registerrier · 2019-07-26T09:29:55Z

docs/development/pigs/pig-009.rst

+
+The `MapDatasetEventSampler` can be used to sample background events using the 
+`Map.sample(n_events=, random_state=)` as well. The time of the events is sampled
+assuming a costant event rate. Finally, the IRF corrections are not applied to sampled


registerrier · 2019-07-26T09:30:21Z

docs/development/pigs/pig-009.rst

+====================================
+To evaluate the precision and performance of the described framework we propose
+to implement a prototype for a simulation / fitting pipeline. Starting from a
+selection of spatial, spectral and temporal models, data is simulated and fitted


data are simulated

registerrier · 2019-07-26T09:31:13Z

docs/development/pigs/pig-009.rst

+selection of spatial, spectral and temporal models, data is simulated and fitted
+multiple times to derive distributions and pull-distributions of the reconstructed
+model parameters. This pipeline should also monitor the required cpu and memory usage.
+This first prototype can be used to evaluate the optimal bin-size (with the best


the bin size will be valid for a set or IRFs right?

docs/development/pigs/pig-009.rst

lmohrmann

Hi!

I'm a bit puzzled by your code example: first, you compute npred for each of the source components, draw a Poisson sample from that, and then sample the individual events. Then, after having stacked all the events, you somehow apply the PSF and EDISP. But usually the MapDataset will be defined with a PSF and EDISP already, and so npred is already convolved with the IRFs.

I also don't quite understand why you need to compute npred at all. Can't you sample the events from the models directly, and then apply PSF and EDISP to go from true to reconstructed quantities? My feeling is that going the detour via npred will also introduce a lot of statistical noise.

The rest of the proposal looks OK to me, but this is a major point that should be clarified or changed.

Best,
Lars

cdeil · 2019-08-02T09:01:09Z

@lmohrmann

I think for time models the idea is to sample directly, that should be clarified.

For spatial models, it would be possible to sample directly, but would require analytical "sample" methods on each model. Possible, but not what's proposed here.

The most difficult part is to know how many events to sample and the energy distribution. There one has to use npred = flux * exposure, where exposure = effective_area * obs_time. That's the proposal here, to use this kind of npred, which doesn't have PSF or EDISP applied.

An alternative would be to sample the flux distributions, and then to reject sampled events based on exposure. But that's difficult to do efficiently, e.g. with a power-law spectrum all the flux is at low energies but there the exposure is almost zero.

@lmohrmann - OK to stick with the method to sample npred = flux * exposure and purely npred map-based sampling?

@adonath @fabiopintore - Please clarify the method used (and clarify or remove the code example). If I see correctly, currently the words "exposure" or "effective area" never appear in the PIG -- make it clearer what npred is.

lmohrmann · 2019-08-02T09:15:32Z

@cdeil @fabiopintore

I see the advantage of using the existing code to calculate some npred and then sample from that rather than adding sample methods on each model. But it does not become clear from the PIG that npred is supposed to be just flux * exposure. So it should be stated clearly in the PIG that that is the case and npred does not involve any folding with psf or edisp. Usually psf and edisp are considered when calculating npred.

fabiopintore · 2019-08-06T14:46:00Z

@lmohrmann @cdeil
As correctly answered by @cdeil , npred is calculated without invoking PSF or Edisp.
I've included some sentences to describe better npred.

adonath · 2019-08-30T12:38:10Z

The review period for this PIG expired August 20. All remaining comment were addressed.
I changed the PIG status and filled the decision section. I'll go ahead and merge this PR now.

Thank you @fabiopintore for the work on this PIG and thanks @cdeil, @registerrier and @lmohrmann for the feedback!

cdeil · 2019-08-30T12:41:35Z

🎉

@adonath - Can you please add this RST file to the PIG index.rst toctree, so that is shows up in the docs? I usually do that in the PR right before merging and check the HTML. I think this underline is too long and will give a Sphinx warning?
https://github.com/gammapy/gammapy/pull/2136/files#diff-c6913f0944e70c096408aef9be140befR38

adonath self-assigned this May 9, 2019

adonath added the pig label May 9, 2019

adonath added this to the 0.13 milestone May 9, 2019

adonath assigned fabiopintore May 10, 2019

cdeil requested review from adonath, registerrier and cdeil May 13, 2019 20:14

cdeil changed the title ~~Event sampling pig~~ PIG 9 - Event sampling May 13, 2019

cdeil mentioned this pull request May 13, 2019

Implement basic event list sampler #761

Closed

cdeil suggested changes May 13, 2019

View reviewed changes

cdeil reviewed May 13, 2019

View reviewed changes

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved

cdeil reviewed May 13, 2019

View reviewed changes

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved

cdeil reviewed May 13, 2019

View reviewed changes

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved

cdeil reviewed May 13, 2019

View reviewed changes

docs/development/pigs/pig-009.rst Outdated Show resolved Hide resolved

fabiopintore force-pushed the event_sampling_pig branch from 5124d4d to d211293 Compare June 6, 2019 12:09

This was referenced Jun 8, 2019

Adding the InverseCDFSampler #2226

Closed

Implement InverseCDFSampler class #2229

Merged

adonath force-pushed the event_sampling_pig branch 3 times, most recently from 362c5d9 to 97f9ae8 Compare June 28, 2019 11:55

Mac_di_Fabio and others added 8 commits July 4, 2019 14:55

Creates event sample PIG

09b38d9

Further improve to the event sampler PIG

48dffea

Further improve to the event sampler PIG

955e765

Further improve to the event sampler PIG

297a587

update of the PIG-009

da79169

update of the PIG-009

16e2bff

update of the PIG-009

14e036f

update of the PIG-009

edc8466

Update pig-009.rst

f1c17a8

fabiopintore dismissed cdeil’s stale review via f1c17a8 July 18, 2019 10:14

fabiopintore added 2 commits July 18, 2019 12:16

Update pig-009.rst

725bbbd

Update pig-009.rst

6ac4c5d

cdeil unassigned adonath Jul 19, 2019

Update pig-009.rst

16e22b5

adonath requested changes Jul 24, 2019

View reviewed changes

fabiopintore added 2 commits July 24, 2019 15:52

Axel comments implemented

d4398cb

Update pig-009.rst

4cb584d

cdeil suggested changes Jul 26, 2019

View reviewed changes

registerrier previously approved these changes Jul 26, 2019

View reviewed changes

Address some more comments

ea1b942

adonath dismissed registerrier’s stale review via ea1b942 July 27, 2019 16:48

Addressed Regis' comments

bd5e7cc

lmohrmann reviewed Jul 30, 2019

View reviewed changes

fabiopintore added 4 commits August 6, 2019 16:27

The comments from Lars and Christoph were included

6e50dae

Update pig-009.rst

930e6ad

Minor changes

582af75

Update pig-009.rst

6317106

fabiopintore and others added 4 commits August 30, 2019 10:00

Update pig-009.rst

d664b15

Update pig-009.rst

41a86dd

Minor polishing of PIG 9

20fa772

Change PIG 9 status to accepted

895a74e

adonath force-pushed the event_sampling_pig branch from ad96adc to 895a74e Compare August 30, 2019 12:36

adonath merged commit 35384fc into gammapy:master Aug 30, 2019


		As a first step, the User should create a `MapDataset` object, as described below:

		# 20 lines of code with the `MapDataset` object setup.

PIG 9 - Event sampling #2136

PIG 9 - Event sampling #2136

Conversation

fabiopintore commented May 9, 2019 • edited

cdeil left a comment

Choose a reason for hiding this comment

cdeil commented Jun 7, 2019

fabiopintore commented Jun 7, 2019 via email

adonath left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdeil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

registerrier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmohrmann left a comment

Choose a reason for hiding this comment

cdeil commented Aug 2, 2019

lmohrmann commented Aug 2, 2019

fabiopintore commented Aug 6, 2019

adonath commented Aug 30, 2019

cdeil commented Aug 30, 2019

fabiopintore commented May 9, 2019 •

edited