-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement SpectrumDatasetOnOff class #2111
Implement SpectrumDatasetOnOff class #2111
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @registerrier, this looks very good to me! I've left a few comments regarding style. What is the general plan with this transition? Do you want to build the new class structure at the side? I'd prefer to refactor existing code as much as possible and try to keep existing tests instead of copying them over. At some point we also have delete existing code...
gammapy/spectrum/__init__.py
Outdated
@@ -12,3 +12,4 @@ | |||
from .fit import * | |||
from .results import * | |||
from .sensitivity import * | |||
from .spectrum_dataset import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just name the submodule .dataset
, because it already lives in the gammapy.spectrum
sub-package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
gammapy/spectrum/spectrum_dataset.py
Outdated
] | ||
|
||
|
||
class ONOFFSpectrumDataset(Dataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current naming scheme we have have for Gammapy we rather use suffixes and PEP 8 recommends camel- case for class names. So I'd prefer SpectrumDatasetOnOff
as a name. I actually also liked the SpectrumDatasetOGIP
name from the prototype notebook, because the data structure is much closer to the OGIP format, with the aeff
and livetime
parameterisation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Changed.
I have used the ONOFF
terminology precisely because I did not want to confuse the Dataset
with a convention for storing spectra. Of course, it is a bit more than a data format since there is, e.g., the prescription that IRFs are described as arf
, edisp
, livetime
rather than e.g. exposure
and edisp
. Also we don't store a alpha
parameter but deduce it from ON.backscal
and OFF.backscal
.
Conversely, you can have an OGIP
compliant spectrum that has no OFF
spectrum, but rather a model for background. In a sense, the current SpectrumDataset
could also be stored in OGIP
-compliant files. Also, not all OGIP
files would be fitted with wstat
.
So I have mixed views regarding the best approach here. Any opinion? @adonath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure either what's best to do here. However I think the current solution, i.e. having the OGIP compliant data model and using it for our analyses is not a bad compromise. In general I'd prefer the data model based on alpha
and exposure
, just because it's more common in Gamma astronomy. But the only solution I see would be to introduce both SpectrumDataset
and SpectrumDatasetOGIP
as well as SpectrumDatasetOnOff
and SpectrumDatasetsOGIPOnOff
, which comes at the cost of some code duplication and proliferation of spectral dataset classes.
gammapy/spectrum/spectrum_dataset.py
Outdated
---------- | ||
model : `~gammapy.spectrum.models.SpectralModel` | ||
Fit model | ||
ONcounts : `~gammapy.spectrum.PHACountsSpectrum` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I think we rather use suffixes...so maybe rename to counts_on
and counts_off
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
gammapy/spectrum/spectrum_dataset.py
Outdated
ON Counts spectrum | ||
OFFcounts : `~gammapy.spectrum.PHACountsSpectrum` | ||
ON Counts spectrum | ||
livetime : float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make lifetime
a Quantity
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
gammapy/spectrum/spectrum_dataset.py
Outdated
return np.sum(stat, dtype=np.float64) | ||
|
||
@classmethod | ||
def read_from_ogip(cls, filename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just call this method .read()
? If we add other I/O formats in future, they'll probably be handled internally and not by multiple read_...
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I implemented a specific method for OGIP files because the initial plan for Dataset
I/O was using yaml files.
I guess I can make .read_from_ogip()
private and call it from .read()
so that we can later fill in other I/O methods. Would that be OK? Or would you prefer make it the read
method and later modify its behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both is fine by me. You could either introduce a private _read_folder()
and call it from .read()
or just keep the implementation in .read()
and we'll modify it later...
gammapy/spectrum/spectrum_dataset.py
Outdated
livetime=on_vector.livetime | ||
) | ||
|
||
def export_to_ogip(self, outdir=None, overwrite=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I think it's fine to just call this .write()
. Multiple I/O formats can be handled internally and must not be reflected in the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment and question as above.
|
||
|
||
|
||
class Test_ONOFFSpectrumDataset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the underscore...class names are recommended to use camel-case in PEP8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
The addition of the Once we change |
The PR is now complete. Major evolutions since the review:
|
I moved the |
|
||
|
||
class SpectrumDatasetOnOff(Dataset): | ||
"""Compute spectral model fit statistic on a ON OFF Spectrum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring should probably be adapted right? Why does a class called SomethingDataset
compute spectral model fit statistics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the point of the Dataset
framework is to provide the likelihood for a given reduced dataset and a model.
See the introduction of PIG 8:
'A dataset bundles the reduced data with a parameteric model and fit statistics function. It evaluates the model and log-likelihood and passes it on to the fit object. Datasets can be combined by adding their log-likelihood values and concatenating their model parameters.'
Maybe the term 'fit statistic' is a bit mis-leading, we preferred it to log-likelihood for instance, because it is never really a log-likelihood but rather Cash or wstat statistics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @mackaiver, that we probably should take some time before v0.12 to unify and improve the docstrings of our dataset classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the part of the PIG you quoted sounds like a good start for a class docstring
A dataset bundles the reduced data with a parameteric model and fit statistics function. It evaluates the model and log-likelihood and passes it on to the fit object. Datasets can be combined by adding their log-likelihood values and concatenating their model parameters.'
This is a first PR to introduce a
Dataset
adapted to ON - OFF spectra withPHACountsSpectrum
members.Parts of the tests are adapted from the
SpectrumFit
test.This is a first set of commit, the PR is not finished, but is open for comments.