Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SpectrumDatasetOnOff class #2111

Merged
merged 15 commits into from Apr 9, 2019

Conversation

@registerrier
Copy link
Contributor

commented Apr 5, 2019

This is a first PR to introduce a Dataset adapted to ON - OFF spectra with PHACountsSpectrum members.

Parts of the tests are adapted from the SpectrumFit test.

This is a first set of commit, the PR is not finished, but is open for comments.

@registerrier registerrier requested a review from adonath Apr 5, 2019

@registerrier registerrier self-assigned this Apr 5, 2019

@registerrier registerrier added the feature label Apr 5, 2019

@registerrier registerrier added this to To do in Spectrum analysis (1D) via automation Apr 5, 2019

@registerrier registerrier added this to In progress in Modeling via automation Apr 5, 2019

@registerrier registerrier added this to the 0.12 milestone Apr 5, 2019

@adonath
Copy link
Member

left a comment

Thanks @registerrier, this looks very good to me! I've left a few comments regarding style. What is the general plan with this transition? Do you want to build the new class structure at the side? I'd prefer to refactor existing code as much as possible and try to keep existing tests instead of copying them over. At some point we also have delete existing code...

@@ -12,3 +12,4 @@
from .fit import *
from .results import *
from .sensitivity import *
from .spectrum_dataset import *

This comment has been minimized.

Copy link
@adonath

adonath Apr 7, 2019

Member

Maybe just name the submodule .dataset, because it already lives in the gammapy.spectrum sub-package?

This comment has been minimized.

Copy link
@registerrier

registerrier Apr 8, 2019

Author Contributor

Done

]


class ONOFFSpectrumDataset(Dataset):

This comment has been minimized.

Copy link
@adonath

adonath Apr 7, 2019

Member

In the current naming scheme we have have for Gammapy we rather use suffixes and PEP 8 recommends camel- case for class names. So I'd prefer SpectrumDatasetOnOff as a name. I actually also liked the SpectrumDatasetOGIP name from the prototype notebook, because the data structure is much closer to the OGIP format, with the aeff and livetime parameterisation.

This comment has been minimized.

Copy link
@registerrier

registerrier Apr 8, 2019

Author Contributor

OK. Changed.

I have used the ONOFF terminology precisely because I did not want to confuse the Dataset with a convention for storing spectra. Of course, it is a bit more than a data format since there is, e.g., the prescription that IRFs are described as arf, edisp, livetime rather than e.g. exposure and edisp. Also we don't store a alpha parameter but deduce it from ON.backscal and OFF.backscal.

Conversely, you can have an OGIP compliant spectrum that has no OFF spectrum, but rather a model for background. In a sense, the current SpectrumDataset could also be stored in OGIP-compliant files. Also, not all OGIP files would be fitted with wstat.

So I have mixed views regarding the best approach here. Any opinion? @adonath

This comment has been minimized.

Copy link
@adonath

adonath Apr 9, 2019

Member

I'm not 100% sure either what's best to do here. However I think the current solution, i.e. having the OGIP compliant data model and using it for our analyses is not a bad compromise. In general I'd prefer the data model based on alpha and exposure, just because it's more common in Gamma astronomy. But the only solution I see would be to introduce both SpectrumDataset and SpectrumDatasetOGIP as well as SpectrumDatasetOnOff and SpectrumDatasetsOGIPOnOff, which comes at the cost of some code duplication and proliferation of spectral dataset classes.

----------
model : `~gammapy.spectrum.models.SpectralModel`
Fit model
ONcounts : `~gammapy.spectrum.PHACountsSpectrum`

This comment has been minimized.

Copy link
@adonath

adonath Apr 7, 2019

Member

Again I think we rather use suffixes...so maybe rename to counts_on and counts_off?

This comment has been minimized.

Copy link
@registerrier

registerrier Apr 8, 2019

Author Contributor

OK

ON Counts spectrum
OFFcounts : `~gammapy.spectrum.PHACountsSpectrum`
ON Counts spectrum
livetime : float

This comment has been minimized.

Copy link
@adonath

adonath Apr 7, 2019

Member

Make lifetime a Quantity?

This comment has been minimized.

Copy link
@registerrier

registerrier Apr 8, 2019

Author Contributor

Right.

return np.sum(stat, dtype=np.float64)

@classmethod
def read_from_ogip(cls, filename):

This comment has been minimized.

Copy link
@adonath

adonath Apr 7, 2019

Member

Just call this method .read()? If we add other I/O formats in future, they'll probably be handled internally and not by multiple read_... methods.

This comment has been minimized.

Copy link
@registerrier

registerrier Apr 8, 2019

Author Contributor

OK. I implemented a specific method for OGIP files because the initial plan for Dataset I/O was using yaml files.
I guess I can make .read_from_ogip() private and call it from .read() so that we can later fill in other I/O methods. Would that be OK? Or would you prefer make it the read method and later modify its behavior?

This comment has been minimized.

Copy link
@adonath

adonath Apr 8, 2019

Member

Both is fine by me. You could either introduce a private _read_folder() and call it from .read() or just keep the implementation in .read() and we'll modify it later...

livetime=on_vector.livetime
)

def export_to_ogip(self, outdir=None, overwrite=False):

This comment has been minimized.

Copy link
@adonath

adonath Apr 7, 2019

Member

Again I think it's fine to just call this .write(). Multiple I/O formats can be handled internally and must not be reflected in the API.

This comment has been minimized.

Copy link
@registerrier

registerrier Apr 8, 2019

Author Contributor

Same comment and question as above.




class Test_ONOFFSpectrumDataset:

This comment has been minimized.

Copy link
@adonath

adonath Apr 7, 2019

Member

Remove the underscore...class names are recommended to use camel-case in PEP8.

This comment has been minimized.

Copy link
@registerrier

registerrier Apr 8, 2019

Author Contributor

OK

@registerrier registerrier changed the title Introduce ONOFFSpectrumDataset Introduce SpectrumDatasetONOFF Apr 8, 2019

@registerrier

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

The addition of the Dataset itself can be done fully in parallel to the existing code, by adapting tests in test_fit.py to the Dataset instead of SpectrumFit.

Once we change SpectrumExtract, we can remove SpectrumFit and SpectrumObservation.
I would keep that for a subsequent PR.

@registerrier

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

The PR is now complete. Major evolutions since the review:

  1. A dataset can now be created from SpectrumObservation. There are still quite a few functionalities defined on SpectrumObservation and SpectrumObservationList that are useful, e.g. stacking, safe energy range computation, computing spectrum statistics etc. It is not clear yet that we can get rid of them.
  2. The main missing feature to completely replace SpectrumFit is the fit range.

@adonath adonath changed the title Introduce SpectrumDatasetONOFF Introduce SpectrumDatasetOnOff Apr 9, 2019

@adonath

adonath approved these changes Apr 9, 2019

@adonath adonath merged commit 53bfa18 into gammapy:master Apr 9, 2019

9 checks passed

Codacy/PR Quality Review Up to standards. A positive pull request.
Details
Scrutinizer Analysis: 3 new issues, 28 updated code elements – Tests: passed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
gammapy.gammapy Build #20190408.5 succeeded
Details
gammapy.gammapy (DevDocs) DevDocs succeeded
Details
gammapy.gammapy (Lint) Lint succeeded
Details
gammapy.gammapy (Test Python35) Test Python35 succeeded
Details
gammapy.gammapy (Test Windows35) Test Windows35 succeeded
Details
gammapy.gammapy (Test Windows37) Test Windows37 succeeded
Details

Spectrum analysis (1D) automation moved this from To do to Done Apr 9, 2019

Modeling automation moved this from In progress to Done Apr 9, 2019

@adonath

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

I moved the SpectrumDataset to spectrum.dataset in a follow up commit tin master 286b289.



class SpectrumDatasetOnOff(Dataset):
"""Compute spectral model fit statistic on a ON OFF Spectrum.

This comment has been minimized.

Copy link
@mackaiver

mackaiver Apr 9, 2019

Contributor

This docstring should probably be adapted right? Why does a class called SomethingDataset compute spectral model fit statistics?

This comment has been minimized.

Copy link
@registerrier

registerrier Apr 9, 2019

Author Contributor

Well the point of the Dataset framework is to provide the likelihood for a given reduced dataset and a model.
See the introduction of PIG 8:
'A dataset bundles the reduced data with a parameteric model and fit statistics function. It evaluates the model and log-likelihood and passes it on to the fit object. Datasets can be combined by adding their log-likelihood values and concatenating their model parameters.'

Maybe the term 'fit statistic' is a bit mis-leading, we preferred it to log-likelihood for instance, because it is never really a log-likelihood but rather Cash or wstat statistics.

This comment has been minimized.

Copy link
@adonath

adonath Apr 9, 2019

Member

I agree with @mackaiver, that we probably should take some time before v0.12 to unify and improve the docstrings of our dataset classes.

This comment has been minimized.

Copy link
@mackaiver

mackaiver Apr 9, 2019

Contributor

I think the part of the PIG you quoted sounds like a good start for a class docstring

A dataset bundles the reduced data with a parameteric model and fit statistics function. It evaluates the model and log-likelihood and passes it on to the fit object. Datasets can be combined by adding their log-likelihood values and concatenating their model parameters.'

@adonath adonath changed the title Introduce SpectrumDatasetOnOff Introduce SpectrumDatasetOnOff class May 27, 2019

@adonath adonath changed the title Introduce SpectrumDatasetOnOff class Implement SpectrumDatasetOnOff class May 27, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.