-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MapDatasetMetaData container #4853
Conversation
Thanks a lot, @AtreyeeS . For information, I am working on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AtreyeeS .
Maybe a very general comment first. We already see that informations are replicated between many metadata objects. Also, it seems useless to validate the same information in many different places.
This means that we need a higher granularity in metadata structures. Our ObjectMetaData
classes will then mostly contain lower level metadata objects + some informations specific to the object.
Then stacking would mostly be building a list of e.g. ObsInfoMetaData
and EventMetaData
.
It would be good to also identify entries that are specific to the Dataset
, if there are.
Maybe the name
, possibly the target_name
gammapy/datasets/metadata.py
Outdated
event_type: Optional[Union[int, list[int]]] | ||
optional: Optional[dict] | ||
|
||
@validator("creation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure you need a validation here.
Technically, in general, I think the CreatorMetaData
can be None until the object is serialized (unless the information is read from disk).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought MapDataset.empty()
can add the CreatorMetaData
and the rest of the information filled in by the MapDatasetMaker
gammapy/datasets/metadata.py
Outdated
f"Incorrect pointing. Expect CreatorMetaData got {type(v)} instead." | ||
) | ||
|
||
@validator("instrument") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here no validation is needed either as str
is a standard type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this I will remove. I wanted to be sure we forbid a list of strings here
gammapy/datasets/metadata.py
Outdated
telescope: Optional[Union[str, list[str]]] | ||
observation_mode: Optional[Union[str, list]] | ||
pointing: Optional[Union[SkyCoord, list[SkyCoord]]] | ||
obs_ids: Optional[Union[int, list[int]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure obs_id
can only be int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question about event types as well.
gammapy/datasets/metadata.py
Outdated
@validator("obs_ids", "event_type") | ||
def validate_obs_ids(cls, v): | ||
if v is None: | ||
return -999 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you use a value like -999. It could stay None. No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With None, I get a pydantic.error_wrappers.ValidationError
none is not an allowed value (type=type_error.none.not_allowed)
6231912
to
9ac0487
Compare
I updated to use pydantic 2.0. I think the present stacking logic is too repetitive. Can it be simplified @adonath ? |
Thanks a lot @AtreyeeS! Probably the following code structure is simpler? from pydantic import BaseModel, RootModel
from typing import List, Optional
class MapDatasetMetaDataItem(BaseModel):
name: str
event_type: Optional[str] = None
class MapDatasetMetaData(RootModel):
root: List[MapDatasetMetaDataItem]
instrument: Optional[str] = None
creation: ...
def __iter__(self):
return iter(self.root)
def __getitem__(self, item):
# could use the obs id here
return self.root[item]
def to_table(self):
pass
def from_table(self):
pass
def stack(self, item):
# just append the list here
pass
meta = MapDatasetMetaData(
[
{"name": "hess"},
{"name": "magic"},
{"name": "veritas"},
{"name": "fact"},
{"name": "cta"},
]
)
meta[3].name Having parallel lists is usually not a good idea, as it leads to these repetitive code statements. Introducing two separate classes might be the better solution here. |
If we follow the ideaof @adonath , maybe one should keep the same logics of naming scheme between the class and the metadata class, ie: In the list of instruments, one could add hawc, fermilat, astri.... Otherwise, I am wondering whether this class with this content is more a DatasetMetaData class... It seems very generic to all dataset child classes. It not it? |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #4853 +/- ##
==========================================
+ Coverage 75.00% 75.04% +0.04%
==========================================
Files 232 233 +1
Lines 34554 34622 +68
==========================================
+ Hits 25917 25982 +65
- Misses 8637 8640 +3 ☔ View full report in Codecov by Sentry. |
I added a So the current behaviour is that there is a mechanism
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AtreyeeS . I have left a few comments.
gammapy/datasets/metadata.py
Outdated
"instrument": "INSTRUM", | ||
"telescope": "TELESCOP", | ||
"observation_mode": "OBS_MODE", | ||
"pointing": "POINTING", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this you don't need since "pointing"
is already a MetaData
object
This now uses the |
gammapy/datasets/metadata.py
Outdated
"creation": "CREATION", | ||
"event_types": "EVT_TYPE", | ||
"optional": "OPTIONAL", | ||
"pointing": "POINTING", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without adding pointing
, to_header
fails if altaz
is not present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I see. This is then a bug. Initially with SkyCoord
set to np.nan
the serialization wouldn't fail because v.alt
and v.az
would exist. This is no longer the case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AtreyeeS . Looks good to me! No further comment.
I checked that the meta information is not serialised with the dataset object. |
gammapy/utils/metadata.py
Outdated
_tag: ClassVar[Literal["obs_info"]] = "obs_info" | ||
|
||
obs_id: int | ||
obs_id: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it changed to str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed previously that obs_ids don't need to be necessarily integers, and can be strings, and thus the MapDatasetMetaData was taking strings for obs_id
Having a str
accepts both int
and str
on input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the gadf requirement is that this is an int. That's why I kept this constrain when introducing the ObsInfoMetaData
.
Unless there is a strong constraint for the MapDatasetMetaData
I'd rather keep the same solution and update later if necessary. No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no opinions on this.. I can then change the behaviour for the MapDataset as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One should take care of this type, as obs_id is used everywhere... I think that most of the time an int is used.
If we plan to change this for our internal data (here, we do not care of gadf), many many codes should be checked and changed.
But it is true that having internally a str is more generic. (but it requires many changes in our code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkhelifi The current implementation allows to read int
as seen in the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, but pydantic is changing the type to str
.
For now, I think it is simpler to stick to int
.
b0a7627
to
ab8fbae
Compare
A test is failing:
|
Signed-off-by: Atreyee Sinha <asinha@ucm.es>
Signed-off-by: Atreyee Sinha <asinha@ucm.es>
Signed-off-by: Atreyee Sinha <asinha@ucm.es>
Signed-off-by: Atreyee Sinha <asinha@ucm.es>
Signed-off-by: Atreyee Sinha <asinha@ucm.es>
Signed-off-by: Atreyee Sinha <asinha@ucm.es>
Signed-off-by: Atreyee Sinha <asinha@ucm.es>
Signed-off-by: Atreyee Sinha <asinha@ucm.es>
Signed-off-by: Atreyee Sinha <asinha@ucm.es>
ab8fbae
to
4818e37
Compare
Signed-off-by: Atreyee Sinha <asinha@ucm.es>
Signed-off-by: Atreyee Sinha <asinha@ucm.es>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AtreyeeS . This looks good! No further comment from my side.
Signed-off-by: Régis Terrier <rterrier@apc.in2p3.fr>
I have corrected an issue with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AtreyeeS . Let's wait for the CI to run and merge.
This PR is associated to the issue #4791 |
Addresses #4792.
This PR adds a meta container on the MapDataset object.
optional
dict keys are just stacked - and then users can decide what to do with itmaker
can be added? To know how this dataset was created, eg: the outcome of some maker process or created by hand by the user. Or is that provenance information @bkhelifiHowever,
isinstance(np.nan, int)
returnsFalse
, which raisesValidationError
. I have put -999 as default for the event types, any better solution?opinions please @registerrier @adonath @QRemy @maxnoe
Still needs in future PRs: