Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Datasets class #2030

Merged
merged 10 commits into from Feb 15, 2019
Merged

Add Datasets class #2030

merged 10 commits into from Feb 15, 2019

Conversation

@adonath
Copy link
Member

@adonath adonath commented Feb 14, 2019

This PR introduces the Datasets object to enable combining multiple datasets for fitting. There is one important API change involved: the covariance has to be accessed from result.parameters.covariance.

@adonath adonath self-assigned this Feb 14, 2019
@adonath adonath added this to the 0.11 milestone Feb 14, 2019
@adonath adonath requested review from AtreyeeS and registerrier Feb 14, 2019
Copy link
Member

@AtreyeeS AtreyeeS left a comment

This looks really good @adonath ! Shall you address the issue of unused parameters in likelihood() in a follow-up PR?

return self.datasets[0].model

@lazyproperty
def is_all_same_type(self):
Copy link
Member

@AtreyeeS AtreyeeS Feb 14, 2019

Since you are providing this option, maybe it can also be helpful to have a function to return the types of datasets present?

Copy link
Member Author

@adonath adonath Feb 14, 2019

Good suggestion, I'll add such a property.


@property
def model(self):
return self.datasets[0].model
Copy link
Member

@AtreyeeS AtreyeeS Feb 14, 2019

I am a bit confused. The same model need not reside on all datasets, right? So what is the utility of this function?

Copy link
Member Author

@adonath adonath Feb 14, 2019

I decided to remove this again and break the backwards compatibility. The result object returned by Fit.optimize() does not contain a reference to the model anymore but instead to the joined parameter list. I think this is a justified change, because there is not necessarily
a single model across many datasets (e.g. if there are background model per dataset).

Copy link
Contributor

@registerrier registerrier left a comment

Thanks @adonath . This looks good.

The global mask might be a problem if we don't have a simple to test that Datasets have the same dimensions.
I suppose also that a global model must be used or we need to introduce at some point a list of models or a Models object.

return self.datasets[0].model

@lazyproperty
def is_all_same_type(self):
Copy link
Contributor

@registerrier registerrier Feb 14, 2019

IF the mask is to be applied on likelihood_per_bin then it is not enough to test the type of each member of the list but you also need to test their dimensions. No?

Copy link
Member Author

@adonath adonath Feb 14, 2019

Thanks, I modified the check accordingly.


@property
def model(self):
return self.datasets[0].model
Copy link
Contributor

@registerrier registerrier Feb 14, 2019

Since the model is not attached to the list of Dataset but to each Datasetin particular, why would you return the first of the list?
OK I understand, this is for compatibility reasons.

Copy link
Member Author

@adonath adonath Feb 14, 2019

See answer to @AtreyeeS above.

@adonath adonath dismissed stale reviews from registerrier and AtreyeeS via 07658c1 Feb 14, 2019
@adonath
Copy link
Member Author

@adonath adonath commented Feb 15, 2019

@registerrier @AtreyeeS Any further comments? If not I'd like to go ahead and merge...

Copy link
Contributor

@registerrier registerrier left a comment

Some comments, but that can probably be addressed later when the various datasets classes are introduced.

@property
def data_shape(self):
"""Shape of the counts data"""
return self.counts.data.shape
Copy link
Contributor

@registerrier registerrier Feb 15, 2019

I think this is fine for the moment.
Ultimately, it is probably better to compare the MapGeom because that is what needs to be identical. The issue being that a FluxPointDataset or a SpectrumDataset doesn't have a Geom.

Copy link
Contributor

@registerrier registerrier Feb 15, 2019

Shouldn't there be a test specifically for MapDataset testing the properties as well as the likelihood methods?

Copy link
Member Author

@adonath adonath Feb 15, 2019

Indeed I'll add a test for MapDataset...

@@ -1082,6 +1082,10 @@ def __init__(self, model, data, mask=None, likelihood="chi2"):
" either 'chi2' or 'chi2assym'"
)

def data_shape(self):
"""Shape of the flux points data"""
return self.data.e_ref.shape
Copy link
Contributor

@registerrier registerrier Feb 15, 2019

Same comment here. Should it be the shape or e_ref itself that is equal?

Copy link
Member Author

@adonath adonath Feb 15, 2019

Yes, e_ref is the reference energy for the data...

import pytest
import numpy as np
from numpy.testing import assert_allclose
from .test_fit import MyDataset
Copy link
Contributor

@registerrier registerrier Feb 15, 2019

Should the test rely only on a fake Dataset or should the be specific tests for each type of dataset and also for a heterogeneous set? Maybe this is for later.

Copy link
Member Author

@adonath adonath Feb 15, 2019

I'll add a test for multiple MapDataset now, the joint fit for SpectrumDataset and FluxPointsDataset could come later...

@adonath adonath merged commit d696043 into gammapy:master Feb 15, 2019
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants