New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SpectrumSimulation class #647

Merged
merged 3 commits into from Jul 31, 2016

Conversation

Projects
None yet
3 participants
@joleroi
Contributor

joleroi commented Jul 25, 2016

This PR adds the class SpectrumSimulation that can simulated a SpectrumObservation given a source model and IRFs.

@joleroi joleroi added the feature label Jul 25, 2016

@joleroi joleroi changed the title from WIP: Add gammapy.spectrum.SpectrumSimulation to Add gammapy.spectrum.SpectrumSimulation Jul 28, 2016

@joleroi

This comment has been minimized.

Show comment
Hide comment
@joleroi

joleroi Jul 28, 2016

Contributor

@cdeil please review

Contributor

joleroi commented Jul 28, 2016

@cdeil please review

@cdeil cdeil added this to the 0.5 milestone Jul 28, 2016

@cdeil cdeil self-assigned this Jul 28, 2016

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Jul 28, 2016

Member

Very nice, thanks!

Did you remove apply_energy_cut on purpose? Is it superfluous or is that functionality is available in some other way?

I think handling livetime and npred in this way might be non-ideal.
You assume model, aeff and edisp will not change, you force livetime to be the same, and you cache npred (although it will become invalid if model or aeff or edisp change).

Without having thought about it much, my recommendation would be: handle livetime like model, aeff and edisp, and don't cache npred, always compute it on property access (no _npred data member).
Then it should be more flexible and less dangerous of getting incorrect npred, no?

Another option might be to pass livetime into simulate_obs and not have it as a data member.

It's hard to say what the best pattern is without having usage examples. In this spirit: my suggestion would be to take the usage example out of the class docstring, move it into a high-level docs page, and there also show an example how one runs several (say 3 or 10, so that it's still fast) simulations for a little study.
Maybe from seeing this use case the best way to handle obstime and npred will be clear?
(and maybe not ... it will just be specific to this one way to use it).

I hope this helps a bit.
In any case, feel free to merge any time.
This will be extended (e.g. to simulate background) in the future anyways, no need to get it perfect at the start.

Member

cdeil commented Jul 28, 2016

Very nice, thanks!

Did you remove apply_energy_cut on purpose? Is it superfluous or is that functionality is available in some other way?

I think handling livetime and npred in this way might be non-ideal.
You assume model, aeff and edisp will not change, you force livetime to be the same, and you cache npred (although it will become invalid if model or aeff or edisp change).

Without having thought about it much, my recommendation would be: handle livetime like model, aeff and edisp, and don't cache npred, always compute it on property access (no _npred data member).
Then it should be more flexible and less dangerous of getting incorrect npred, no?

Another option might be to pass livetime into simulate_obs and not have it as a data member.

It's hard to say what the best pattern is without having usage examples. In this spirit: my suggestion would be to take the usage example out of the class docstring, move it into a high-level docs page, and there also show an example how one runs several (say 3 or 10, so that it's still fast) simulations for a little study.
Maybe from seeing this use case the best way to handle obstime and npred will be clear?
(and maybe not ... it will just be specific to this one way to use it).

I hope this helps a bit.
In any case, feel free to merge any time.
This will be extended (e.g. to simulate background) in the future anyways, no need to get it perfect at the start.

@joleroi

This comment has been minimized.

Show comment
Hide comment
@joleroi

joleroi Jul 29, 2016

Contributor

Did you remove apply_energy_cut on purpose?

Superfluous, this was an early attempt at obs stacking.

I think handling livetime and npred in this way might be non-ideal.

I had the same doubts. The reasons why I wanted to cache npred is that it is the most computationally expensive step in the simulation (due to the interpolation of the IRFS, I think). But when simulation and fitting this actually is negligible (at the moment, compared to other things). But I cannot harm to have a fast simulator, no? 1000 simulations per set of IRFS and livetime is not a crazy thing to do, is it?

So maybe there are these options

  • Keep the cached npred. Make livetime, aeff, edisp properties so nothing can go wrong (I think this is conceptually nice, because then every SpectrumSimulation instance is a different configuration. I you want so simulate 5 different aeff, you need 5 instances)
  • always calc npred on demans (as you propse). Pure anarchy, but safe.

Ok, if I implement the first option?

Contributor

joleroi commented Jul 29, 2016

Did you remove apply_energy_cut on purpose?

Superfluous, this was an early attempt at obs stacking.

I think handling livetime and npred in this way might be non-ideal.

I had the same doubts. The reasons why I wanted to cache npred is that it is the most computationally expensive step in the simulation (due to the interpolation of the IRFS, I think). But when simulation and fitting this actually is negligible (at the moment, compared to other things). But I cannot harm to have a fast simulator, no? 1000 simulations per set of IRFS and livetime is not a crazy thing to do, is it?

So maybe there are these options

  • Keep the cached npred. Make livetime, aeff, edisp properties so nothing can go wrong (I think this is conceptually nice, because then every SpectrumSimulation instance is a different configuration. I you want so simulate 5 different aeff, you need 5 instances)
  • always calc npred on demans (as you propse). Pure anarchy, but safe.

Ok, if I implement the first option?

@joleroi

This comment has been minimized.

Show comment
Hide comment
@joleroi

joleroi Jul 29, 2016

Contributor

my suggestion would be to take the usage example out of the class docstring, move it into a high-level docs page, and there also show an example how one runs several (say 3 or 10, so that it's still fast) simulations for a little study.

👍

Contributor

joleroi commented Jul 29, 2016

my suggestion would be to take the usage example out of the class docstring, move it into a high-level docs page, and there also show an example how one runs several (say 3 or 10, so that it's still fast) simulations for a little study.

👍

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Jul 29, 2016

Member

In Python, if you want to have "safe" classes and especially if you use caching, you have to make every data member a hidden private member and either prevent setting, or implement getters, setters and do validation in the setters.

I'm -1 on doing this here ... you'll spend a lot of time to figure this out and get it right. And in the end, this is a class that will be used more by experts and there it's good to not be prevented from doing whatever you want.

So my suggestion would be to make npred a property, not a cached lazyproperty.
If the user really needs to cache this for efficiency, they still can in a local variable outside this class. But probably it's not needed anyways.

If you prefer something else, fine, do whatever you want here and keep using and changing the class to suit your own spectrum simulation needs.

Member

cdeil commented Jul 29, 2016

In Python, if you want to have "safe" classes and especially if you use caching, you have to make every data member a hidden private member and either prevent setting, or implement getters, setters and do validation in the setters.

I'm -1 on doing this here ... you'll spend a lot of time to figure this out and get it right. And in the end, this is a class that will be used more by experts and there it's good to not be prevented from doing whatever you want.

So my suggestion would be to make npred a property, not a cached lazyproperty.
If the user really needs to cache this for efficiency, they still can in a local variable outside this class. But probably it's not needed anyways.

If you prefer something else, fine, do whatever you want here and keep using and changing the class to suit your own spectrum simulation needs.

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Jul 29, 2016

Member

PS: I'm considering making the Gammapy 0.5 release once more tomorrow or Monday latest.
This is not a blocker, but if you can, merge in open branches and then continue new ones later.

Member

cdeil commented Jul 29, 2016

PS: I'm considering making the Gammapy 0.5 release once more tomorrow or Monday latest.
This is not a blocker, but if you can, merge in open branches and then continue new ones later.

@joleroi

This comment has been minimized.

Show comment
Hide comment
@joleroi

joleroi Jul 29, 2016

Contributor

Thanks for the feedback. I will remove the _npred attribute and think about it again if I run into efficiency problems later.

Contributor

joleroi commented Jul 29, 2016

Thanks for the feedback. I will remove the _npred attribute and think about it again if I run into efficiency problems later.

@joleroi

This comment has been minimized.

Show comment
Hide comment
@joleroi

joleroi Jul 29, 2016

Contributor

From my side this is ready to be merge.

I included a png file in the little tutorial but didn't add it to the repo yet. It's only 100 K so ok to add?

Contributor

joleroi commented Jul 29, 2016

From my side this is ready to be merge.

I included a png file in the little tutorial but didn't add it to the repo yet. It's only 100 K so ok to add?

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Jul 29, 2016

Member

I would prefer if you include the PNG in gammapy-extra, e.g. here:
https://github.com/gammapy/gammapy-extra/tree/master/figures

Even if this image is just 100k, we will have 10s or 100s of images and many versions over time, and the code repo would grow a lot.

If you can't get the image loaded from the docs build, no worries, I can do it tomorrow.

Member

cdeil commented Jul 29, 2016

I would prefer if you include the PNG in gammapy-extra, e.g. here:
https://github.com/gammapy/gammapy-extra/tree/master/figures

Even if this image is just 100k, we will have 10s or 100s of images and many versions over time, and the code repo would grow a lot.

If you can't get the image loaded from the docs build, no worries, I can do it tomorrow.

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Jul 29, 2016

Member

You have to make another commit to fix the Sphinx build on travis-ci anyways:
https://travis-ci.org/gammapy/gammapy/jobs/148311915#L1626

Member

cdeil commented Jul 29, 2016

You have to make another commit to fix the Sphinx build on travis-ci anyways:
https://travis-ci.org/gammapy/gammapy/jobs/148311915#L1626

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Jul 29, 2016

Member

I left some minor inline comment.
Otherwise 👍 to merge.
Implementation and docs look good, thanks!

Member

cdeil commented Jul 29, 2016

I left some minor inline comment.
Otherwise 👍 to merge.
Implementation and docs look good, thanks!

@joleroi

This comment has been minimized.

Show comment
Hide comment
@joleroi

joleroi Jul 30, 2016

Contributor

I adressed your last set of comments. I didn't try to include the pic in the docs, can you please do it?

Contributor

joleroi commented Jul 30, 2016

I adressed your last set of comments. I didn't try to include the pic in the docs, can you please do it?

hi_threshold = hi_threshold or 50 * u.TeV
rand = get_random_state(seed)
on_counts = rand.poisson(self.npred.data)

This comment has been minimized.

@adonath

adonath Jul 31, 2016

Member

Just a comment on the random number generation:

The way it works right now (if you use a constant seed) you'll always get the same result, when you call simulate_obs() repeatedly (like in the docs example above). When you use the option 'random-init' to avoid that, the results won't be reproducible any more.

I think the "correct" way to simulate N different Poisson realisations of some model a with numpy is as following:

import numpy as np
from gammapy.utils.random import get_random_state
rand = get_random_state(0)
a = np.array([1, 2, 3, 4])
N = 5
rand.poisson(a * np.ones(N).reshape((N, 1)))

Which returns an array of shape (len(a), N), with N different Poisson realisations of a.

This solves the performance issue as well. But you may have to rethink the API...

@adonath

adonath Jul 31, 2016

Member

Just a comment on the random number generation:

The way it works right now (if you use a constant seed) you'll always get the same result, when you call simulate_obs() repeatedly (like in the docs example above). When you use the option 'random-init' to avoid that, the results won't be reproducible any more.

I think the "correct" way to simulate N different Poisson realisations of some model a with numpy is as following:

import numpy as np
from gammapy.utils.random import get_random_state
rand = get_random_state(0)
a = np.array([1, 2, 3, 4])
N = 5
rand.poisson(a * np.ones(N).reshape((N, 1)))

Which returns an array of shape (len(a), N), with N different Poisson realisations of a.

This solves the performance issue as well. But you may have to rethink the API...

This comment has been minimized.

@cdeil

cdeil Jul 31, 2016

Member

The current API has a seed parameter.
When using it, you could just pass e.g. obs_id from np.arange to get independent, yet reproducible draws.
So I think it's OK and no need to change the API or complicate the implementation by drawing N simulations together.

+1 to do that from the docs instead of calling with seed='random-init' like you do now.
I'll make that change now and merge this PR now.

@cdeil

cdeil Jul 31, 2016

Member

The current API has a seed parameter.
When using it, you could just pass e.g. obs_id from np.arange to get independent, yet reproducible draws.
So I think it's OK and no need to change the API or complicate the implementation by drawing N simulations together.

+1 to do that from the docs instead of calling with seed='random-init' like you do now.
I'll make that change now and merge this PR now.

@cdeil cdeil merged commit b020b52 into gammapy:master Jul 31, 2016

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Jul 31, 2016

Member

I added one commit (6289b56) with minor changes and merged this locally.

@joleroi - Thanks!

Member

cdeil commented Jul 31, 2016

I added one commit (6289b56) with minor changes and merged this locally.

@joleroi - Thanks!

@joleroi joleroi deleted the joleroi:simulation branch Aug 8, 2016

@cdeil cdeil changed the title from Add gammapy.spectrum.SpectrumSimulation to Add SpectrumSimulation class Aug 23, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment