New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIG 9 - Event sampling #2136
PIG 9 - Event sampling #2136
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiopintore @adonath - Thanks!
Some first superficial comments inline.
The basic method proposed here is to sample from binned distributions using inverse CDF, i.e. draw from the npred histogram. That's fine, at least as a first step, Gammapy at the moment is a binned analysis framework and probably other methods aren't within reach at the moment.
But there should be a section (either at the start in an "introduction" or under "alternatives") mentioning that the existing gamma-ray codes that have event samplers do it differently. I don't know much about how the Fermi ST and ctools work, but I do remember that in Gammalib one had to write an mc
method for every new model to support event sampling. You can see here an example with an analytical way to sample and here an example where no analytical sampling method exists, and a rejection sampling method is used.
Please start by adding a section with a bit of information, and then we'll try to gather information how others do event sampling - e.g. I'm sure there are HEP papers as well where they describe how they sample events, e.g. UNURAN was added in ROOT for a reason.
Most likely we will for now stick with the method you propose here. Other methods are much more work to implement (requiring a "sample" method on every model), and there's the technical difficulty that with Numpy really only inverse CDF sampling can be done efficiently, because e.g. for rejection sampling (I think) you'd need a Python for loop which is super slow. But these things might come in Gammapy in the future, e.g. with Numba it's possible to implement quite easily.
My point is: this PIG should contain a bit more information why we chose the binned distribution sampling method, and why for now we didn't implement the well-established solution in gamma-ray astronomy, i.e. what Fermi ST and ctools do.
5124d4d
to
d211293
Compare
@adonath - FYI: We discussed this in the Gammapy dev call for 5 min today. My recommendation would be that you finish writing the PIG and post it for review with a 1 week deadline first, or present and discuss it in the dev call next week, and then after that initial feedback circulate it widely for review. I don't understand what happened in #2204 . The PR description says InverseCDFSampler is added, but I don't think that's what happened in the end and was merged!? Also, looking at the task list in this PIG, it's full of "?", so it's not clear to me what the next PRs will be. |
Hi Chris,
I'm traveling in this moment, hence sorry if i give you only a quick
explanation.
In practice, I opened the PR #2204
<#2204> and I actually added the
InverseCDF class.
However, since it was the first time I was opening a PR, yesterday Axel
told me that it's better to split the PR in smaller steps.
For this reason, we considered this PR only up to:
"This PR introduce a renaming of the gammapy.utils.distributions' into
gammapy.utils.random'.
The content of the current `gammapy.utils.random' is moved in the new
folder.
Then the currently unused GeneralRandomArray class is removed from gammapy.utils.random'
..."
and we didn't included the InverseCDF class yet.
I'm going to open, as quick as possible, a new pull requests for the
InverseCDF alone. Sorry for the confusion.
Also, looking at the task list in this PIG, it's full of "?", so it's not
clear to me what the next PRs will be.
We have to still fix some issues in the PIG. However, after the PR for the
InverseCDF class, the next one would have been the pull request #4
"4. Remove the current `GeneralRandom` class and adapt
`gammapy.astro.population.simulate' accordingly."
However, about this last point I'll discuss with Axel your suggestion on
how to face with it.
If you have any question, I'll try to reply as soon as possible.
Cheers,
F.
Il giorno ven 7 giu 2019 alle ore 11:13 Christoph Deil <
notifications@github.com> ha scritto:
… @adonath <https://github.com/adonath> - FYI: We discussed this in the
Gammapy dev call for 5 min today.
My recommendation would be that you finish writing the PIG and post it for
review with a 1 week deadline first, or present and discuss it in the dev
call next week, and then after that initial feedback circulate it widely
for review.
I don't understand what happened in #2204
<#2204> . The PR description says
InverseCDFSampler is added, but I don't think that's what happened in the
end and was merged!? Also, looking at the task list in this PIG, it's full
of "?", so it's not clear to me what the next PRs will be.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2136?email_source=notifications&email_token=AJMDXANIM6MMLIDYAFNWLL3PZIREVA5CNFSM4HL5RIF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXFJJ7Q#issuecomment-499815678>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJMDXAKQL6FKRZO33XEYCILPZIREVANCNFSM4HL5RIFQ>
.
|
362c5d9
to
97f9ae8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @fabiopintore! This is the last round if comments from my side. After you've implemented those the PIG is ready to circulate from my side.
docs/development/pigs/pig-009.rst
Outdated
propose to implement a framework for event simulation in Gammapy. | ||
|
||
The proposed framework consists of individual building blocks, represented by | ||
classes, that can be chained together to achieve a full simulation of an event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"by classes" -> "by classes and methods"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/development/pigs/pig-009.rst
Outdated
Inverse CDF sampling (inverseCDF_) is an established method to sample from discrete | ||
probability mass functions. However it is not the method of choice for other existing | ||
event samplers such as the the Fermi-LAT Science Tools (gtobsim) and CTOOLS (ctobssim). | ||
The inverse CDF method is also successfully used by `ASTRIsim` (_astrisim), the event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this sentence after "...probability mass function." and shorten to "It is used by ASTRIsim
..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/development/pigs/pig-009.rst
Outdated
|
||
We propose to include in `gammapy.cube` an high level interface (HLI) class, labelled as | ||
`MapDatasetEventSampler`. This class handles the complete event sampling process, | ||
including the corrections related to the IRF and source temporal variability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add "...for a given GTI / observation."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/development/pigs/pig-009.rst
Outdated
uses a Poisson distribution, with mean equal to the predicted counts, to estimate the | ||
random number of sampled events. | ||
|
||
We propose to add a `sample` method in `~gamampy.cube.MapDataset` that will be the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this should be Map.sample(n_events=, random_state=)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/development/pigs/pig-009.rst
Outdated
class described in `PR#2229`_ . The output will be an `~astropy.table.Table` with columns: | ||
`TRUE_RA`, `TRUE_DEC` and `TRUE_ENERGIES`. | ||
|
||
Then, the time will be sampled independently. This will be done through a `sample` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe write LightCurveTableModel.sample(n_events=, random_state=)
, to give some information one the function signature...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/development/pigs/pig-009.rst
Outdated
time of the events. In the case the temporal model is not provided, the time is uniformly | ||
sampled in the time range Tstart-Tstop. | ||
|
||
The IRF correction can now be applied to sampled events. We propose to add a `distribute` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again add the signature of the function call: EdispMap.distribute(events=)
, where events
is a Table
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/development/pigs/pig-009.rst
Outdated
The method interpolates the "correct" IRF at the position of a given event and | ||
applies it. In more detail the class will calculate the psf and the energy dispersion | ||
for each event true position and true energy. The IRFs are assumed to be constant | ||
and not time-dependent. The output will be an `Astropy.Table` with reconstructed event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that .distribute()
requires the true positions as well as energies. So it takes a table object and adds the RA
, DEC
and ENERGY
columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the text.
docs/development/pigs/pig-009.rst
Outdated
Finally, the times and the energies/coordinates of the events will be merged into a | ||
unique `~astropy.table.Table` with the columns:`RA`, `DEC` and `ENERGY and `TIME` . | ||
|
||
The `MapDatasetEventSampler` can be used to sample background events as well, although |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give a little bit more information: background events are drawn using Map.sample(n_events=, random_state=)
and a constant event rate. The IRF distribution is not applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/development/pigs/pig-009.rst
Outdated
|
||
:: | ||
|
||
for src in dataset[source1, source2, ...sourceN]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code should rather be as following:
def sample(dataset, random_state)
"""Sample events from a `MapDataset`"""
events_list = []
for evaluator in dataset.evaluators:
npred = evaluator.compute_npred()
n_events = random_state.poisson(npred.data.sum())
events = npred.sample(n_events, random_state)
events_list.append(events)
events_src = vstack(events_list)
events_src = dataset.psf.distribute(events_src, random_state)
events_src = dataset.edisp.distribute(events_src, random_state)
n_events_bkg = random_state.poisson(dataset.background_model.map.data.sum())
events_bkg = dataset.background_model.sample(n_events, random_state)
events_total = vstack([events_src, events_bkg])
events_total.meta = get_events_meta_data(dataset)
return EventList(events_total)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I've modified the code.
docs/development/pigs/pig-009.rst
Outdated
files decribed on `gamma-astro-data-formats`_. The `MapDatasetSampler` will be designed to work | ||
alternatively for input flux or counts maps. | ||
|
||
The basic code structure is the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the code example above, this could go...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiopintore @adonath - Thank you for all your work. This should be ready to circulate very soon.
Please see my inline comments.
docs/development/pigs/pig-009.rst
Outdated
|
||
As a first step, the User should create a `MapDataset` object, as described below: | ||
|
||
# 20 lines of code with the `MapDataset` object setup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiopintore - What's this about 20 lines of code? Just remove this comment? Or did you want to add something?
docs/development/pigs/pig-009.rst
Outdated
* Created: May 03, 2019 | ||
* Accepted: | ||
* Status: Draft | ||
* Discussion: `PR 2136`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, for consistency, use "GH" instead of "PR", throughout this PIG.
Example: https://raw.githubusercontent.com/gammapy/gammapy/master/docs/development/pigs/pig-001.rst
docs/development/pigs/pig-009.rst
Outdated
We propose to add a `Map.sample(n_events=, random_state=)` method in `~gamampy.cube.MapDataset` | ||
that will be the core of the sampling process. The `sample` is based on the | ||
`~gammapy.utils.random.InverseCDFSampler` class described in `PR#2229`_ . The output | ||
will be an `~astropy.table.Table` with columns: `RA_TRUE`, `DEC_TRUE` and `ENERGIES_TRUE`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENERGIES_TRUE -> ENERGY_TRUE
docs/development/pigs/pig-009.rst
Outdated
|
||
We propose to add a `Map.sample(n_events=, random_state=)` method in `~gamampy.cube.MapDataset` | ||
that will be the core of the sampling process. The `sample` is based on the | ||
`~gammapy.utils.random.InverseCDFSampler` class described in `PR#2229`_ . The output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please here and everywhere when referencing classes, just write this:
``InverseCDFSampler``
The single-backticks and ~ are Sphinx-syntax to create links to API docs. But if the API changes in the coming years, we don't want to have to go back and update PIGs. So from PIGs we don't create these links.
docs/development/pigs/pig-009.rst
Outdated
MapDatasetEventSampler | ||
---------------------- | ||
|
||
The general design of the `sample` method is as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adonath @fabiopintore - I thought you wanted to get MC_ID
. This sample
method doesn't loop over source components, you will not have SOURCE_ID
or MC_ID
.
So either show how it roughly works, or remove the code example?
docs/development/pigs/pig-009.rst
Outdated
events_list.append(events) | ||
|
||
events_src = vstack(events_list) | ||
events_src = dataset.psf.distribute(events_src, random_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to name this method also "sample", not "distribute".
It samples the EDISP or PSF distribution functions, it is a kind of sampling.
I just think if all sampling-related code has the same name it's easier to grok and find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fabiopintore ! This looks good to me.
I have added a number of inline comments for typos and a few clarifications connected to:
- how/when do you use exposure information?
- how are the
LightCurveTableModel
generated times associated to the relevant simulated events?
docs/development/pigs/pig-009.rst
Outdated
|
||
An event sampler of gamma events is of high importance in exploiting the potentiality | ||
of the future Cherenkov Telescope Array (CTA). It will allows us to simulate sources | ||
with different spectral and morphological properties adn e.g. investigating the best |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and .e.g. to investigate
docs/development/pigs/pig-009.rst
Outdated
An event sampler for gamma events is an important part of the science tools | ||
of the future Cherenkov Telescope Array (CTA) observatory. It will allow users | ||
to simulate sources with different spectral, morphological and temporal properties | ||
and predict the performance of CTA on the simulated data e.g. to support observation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the simulated events
docs/development/pigs/pig-009.rst
Outdated
`~gammapy.utils.random.InverseCDFSampler` class described in `PR#2229`_ . The output | ||
will be an `~astropy.table.Table` with columns: `RA_TRUE`, `DEC_TRUE` and `ENERGIES_TRUE`. | ||
|
||
Then, the time will be sampled independently. This will be done through a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the light curve model is connected to a specific source in the sky model, you might say a word on how this will be done in practice?
docs/development/pigs/pig-009.rst
Outdated
|
||
The `MapDatasetEventSampler` can be used to sample background events using the | ||
`Map.sample(n_events=, random_state=)` as well. The time of the events is sampled | ||
assuming a costant event rate. Finally, the IRF corrections are not applied to sampled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constant
docs/development/pigs/pig-009.rst
Outdated
==================================== | ||
To evaluate the precision and performance of the described framework we propose | ||
to implement a prototype for a simulation / fitting pipeline. Starting from a | ||
selection of spatial, spectral and temporal models, data is simulated and fitted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data are simulated
docs/development/pigs/pig-009.rst
Outdated
selection of spatial, spectral and temporal models, data is simulated and fitted | ||
multiple times to derive distributions and pull-distributions of the reconstructed | ||
model parameters. This pipeline should also monitor the required cpu and memory usage. | ||
This first prototype can be used to evaluate the optimal bin-size (with the best |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the bin size will be valid for a set or IRFs right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
I'm a bit puzzled by your code example: first, you compute npred
for each of the source components, draw a Poisson sample from that, and then sample the individual events. Then, after having stacked all the events, you somehow apply the PSF and EDISP. But usually the MapDataset
will be defined with a PSF and EDISP already, and so npred
is already convolved with the IRFs.
I also don't quite understand why you need to compute npred
at all. Can't you sample the events from the models directly, and then apply PSF and EDISP to go from true to reconstructed quantities? My feeling is that going the detour via npred
will also introduce a lot of statistical noise.
The rest of the proposal looks OK to me, but this is a major point that should be clarified or changed.
Best,
Lars
I think for time models the idea is to sample directly, that should be clarified. For spatial models, it would be possible to sample directly, but would require analytical "sample" methods on each model. Possible, but not what's proposed here. The most difficult part is to know how many events to sample and the energy distribution. There one has to use An alternative would be to sample the flux distributions, and then to reject sampled events based on @lmohrmann - OK to stick with the method to sample @adonath @fabiopintore - Please clarify the method used (and clarify or remove the code example). If I see correctly, currently the words "exposure" or "effective area" never appear in the PIG -- make it clearer what |
I see the advantage of using the existing code to calculate some |
@lmohrmann @cdeil |
ad96adc
to
895a74e
Compare
The review period for this PIG expired August 20. All remaining comment were addressed. Thank you @fabiopintore for the work on this PIG and thanks @cdeil, @registerrier and @lmohrmann for the feedback! |
🎉 @adonath - Can you please add this RST file to the PIG index.rst toctree, so that is shows up in the docs? I usually do that in the PR right before merging and check the HTML. I think this underline is too long and will give a Sphinx warning? |
This PR addresses #761, which is related to an event sampler for gammapy.