Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SpectrumDatasetOnOff class #2111

Merged
merged 15 commits into from Apr 9, 2019

Conversation

@registerrier
Copy link
Contributor

@registerrier registerrier commented Apr 5, 2019

This is a first PR to introduce a Dataset adapted to ON - OFF spectra with PHACountsSpectrum members.

Parts of the tests are adapted from the SpectrumFit test.

This is a first set of commit, the PR is not finished, but is open for comments.

@registerrier registerrier requested a review from adonath Apr 5, 2019
@registerrier registerrier self-assigned this Apr 5, 2019
@registerrier registerrier added this to To do in gammapy.makers via automation Apr 5, 2019
@registerrier registerrier added this to In progress in gammapy.modeling via automation Apr 5, 2019
@registerrier registerrier added this to the 0.12 milestone Apr 5, 2019
Copy link
Member

@adonath adonath left a comment

Thanks @registerrier, this looks very good to me! I've left a few comments regarding style. What is the general plan with this transition? Do you want to build the new class structure at the side? I'd prefer to refactor existing code as much as possible and try to keep existing tests instead of copying them over. At some point we also have delete existing code...

@@ -12,3 +12,4 @@
from .fit import *
from .results import *
from .sensitivity import *
from .spectrum_dataset import *
Copy link
Member

@adonath adonath Apr 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just name the submodule .dataset, because it already lives in the gammapy.spectrum sub-package?

Copy link
Contributor Author

@registerrier registerrier Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

]


class ONOFFSpectrumDataset(Dataset):
Copy link
Member

@adonath adonath Apr 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current naming scheme we have have for Gammapy we rather use suffixes and PEP 8 recommends camel- case for class names. So I'd prefer SpectrumDatasetOnOff as a name. I actually also liked the SpectrumDatasetOGIP name from the prototype notebook, because the data structure is much closer to the OGIP format, with the aeff and livetime parameterisation.

Copy link
Contributor Author

@registerrier registerrier Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Changed.

I have used the ONOFF terminology precisely because I did not want to confuse the Dataset with a convention for storing spectra. Of course, it is a bit more than a data format since there is, e.g., the prescription that IRFs are described as arf, edisp, livetime rather than e.g. exposure and edisp. Also we don't store a alpha parameter but deduce it from ON.backscal and OFF.backscal.

Conversely, you can have an OGIP compliant spectrum that has no OFF spectrum, but rather a model for background. In a sense, the current SpectrumDataset could also be stored in OGIP-compliant files. Also, not all OGIP files would be fitted with wstat.

So I have mixed views regarding the best approach here. Any opinion? @adonath

Copy link
Member

@adonath adonath Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure either what's best to do here. However I think the current solution, i.e. having the OGIP compliant data model and using it for our analyses is not a bad compromise. In general I'd prefer the data model based on alpha and exposure, just because it's more common in Gamma astronomy. But the only solution I see would be to introduce both SpectrumDataset and SpectrumDatasetOGIP as well as SpectrumDatasetOnOff and SpectrumDatasetsOGIPOnOff, which comes at the cost of some code duplication and proliferation of spectral dataset classes.

----------
model : `~gammapy.spectrum.models.SpectralModel`
Fit model
ONcounts : `~gammapy.spectrum.PHACountsSpectrum`
Copy link
Member

@adonath adonath Apr 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I think we rather use suffixes...so maybe rename to counts_on and counts_off?

Copy link
Contributor Author

@registerrier registerrier Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

ON Counts spectrum
OFFcounts : `~gammapy.spectrum.PHACountsSpectrum`
ON Counts spectrum
livetime : float
Copy link
Member

@adonath adonath Apr 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make lifetime a Quantity?

Copy link
Contributor Author

@registerrier registerrier Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

return np.sum(stat, dtype=np.float64)

@classmethod
def read_from_ogip(cls, filename):
Copy link
Member

@adonath adonath Apr 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just call this method .read()? If we add other I/O formats in future, they'll probably be handled internally and not by multiple read_... methods.

Copy link
Contributor Author

@registerrier registerrier Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I implemented a specific method for OGIP files because the initial plan for Dataset I/O was using yaml files.
I guess I can make .read_from_ogip() private and call it from .read() so that we can later fill in other I/O methods. Would that be OK? Or would you prefer make it the read method and later modify its behavior?

Copy link
Member

@adonath adonath Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both is fine by me. You could either introduce a private _read_folder() and call it from .read() or just keep the implementation in .read() and we'll modify it later...

livetime=on_vector.livetime
)

def export_to_ogip(self, outdir=None, overwrite=False):
Copy link
Member

@adonath adonath Apr 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I think it's fine to just call this .write(). Multiple I/O formats can be handled internally and must not be reflected in the API.

Copy link
Contributor Author

@registerrier registerrier Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment and question as above.




class Test_ONOFFSpectrumDataset:
Copy link
Member

@adonath adonath Apr 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the underscore...class names are recommended to use camel-case in PEP8.

Copy link
Contributor Author

@registerrier registerrier Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@registerrier registerrier changed the title Introduce ONOFFSpectrumDataset Introduce SpectrumDatasetONOFF Apr 8, 2019
@registerrier
Copy link
Contributor Author

@registerrier registerrier commented Apr 8, 2019

The addition of the Dataset itself can be done fully in parallel to the existing code, by adapting tests in test_fit.py to the Dataset instead of SpectrumFit.

Once we change SpectrumExtract, we can remove SpectrumFit and SpectrumObservation.
I would keep that for a subsequent PR.

@registerrier
Copy link
Contributor Author

@registerrier registerrier commented Apr 8, 2019

The PR is now complete. Major evolutions since the review:

  1. A dataset can now be created from SpectrumObservation. There are still quite a few functionalities defined on SpectrumObservation and SpectrumObservationList that are useful, e.g. stacking, safe energy range computation, computing spectrum statistics etc. It is not clear yet that we can get rid of them.
  2. The main missing feature to completely replace SpectrumFit is the fit range.

@adonath adonath changed the title Introduce SpectrumDatasetONOFF Introduce SpectrumDatasetOnOff Apr 9, 2019
adonath
adonath approved these changes Apr 9, 2019
@adonath adonath merged commit 53bfa18 into gammapy:master Apr 9, 2019
9 checks passed
gammapy.makers automation moved this from To do to Done Apr 9, 2019
gammapy.modeling automation moved this from In progress to Done Apr 9, 2019
@adonath
Copy link
Member

@adonath adonath commented Apr 9, 2019

I moved the SpectrumDataset to spectrum.dataset in a follow up commit tin master 286b289.



class SpectrumDatasetOnOff(Dataset):
"""Compute spectral model fit statistic on a ON OFF Spectrum.
Copy link
Contributor

@kbruegge kbruegge Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring should probably be adapted right? Why does a class called SomethingDataset compute spectral model fit statistics?

Copy link
Contributor Author

@registerrier registerrier Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the point of the Dataset framework is to provide the likelihood for a given reduced dataset and a model.
See the introduction of PIG 8:
'A dataset bundles the reduced data with a parameteric model and fit statistics function. It evaluates the model and log-likelihood and passes it on to the fit object. Datasets can be combined by adding their log-likelihood values and concatenating their model parameters.'

Maybe the term 'fit statistic' is a bit mis-leading, we preferred it to log-likelihood for instance, because it is never really a log-likelihood but rather Cash or wstat statistics.

Copy link
Member

@adonath adonath Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mackaiver, that we probably should take some time before v0.12 to unify and improve the docstrings of our dataset classes.

Copy link
Contributor

@kbruegge kbruegge Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the part of the PIG you quoted sounds like a good start for a class docstring

A dataset bundles the reduced data with a parameteric model and fit statistics function. It evaluates the model and log-likelihood and passes it on to the fit object. Datasets can be combined by adding their log-likelihood values and concatenating their model parameters.'

@adonath adonath changed the title Introduce SpectrumDatasetOnOff Introduce SpectrumDatasetOnOff class May 27, 2019
@adonath adonath changed the title Introduce SpectrumDatasetOnOff class Implement SpectrumDatasetOnOff class May 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants