-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the metadata keys that uniquely identify a DataArray (DataID) configurable per reader #1088
Make the metadata keys that uniquely identify a DataArray (DataID) configurable per reader #1088
Conversation
PS: In the shape this PR is now, I'm expecting this to break a lot of stuff. I'm throwing it here to hopefully get insightful comments that will show if I'm going in the right direction before I spend more time on this. IE, please give some feedback! |
I wonder if this might help to make composite requirements more flexible. I have a use case in fogpy where I have defined a composite that needs:
Currently, the former can match either by name ( |
I don't think dynamic datasets would help with this and I wouldn't want it to evolve to support this functionality in composites. DatasetIDs are identifiers. This requires/provides system is something separate. To full support this we'd have to add |
The bad calibration test was moved to the dataset id tests, as it is now handled there.
I see two benefits from the top of my head:
As for serialisation, there are already a couple of methods for it implemented, the idea being that the enum is converted back to text on dump, and recreated on the fly on load. |
Fixed per reader though, right? What if a reader defines a new calibration that is in the middle of the existing/common calibrations? That means that
That's cool, but it gets back to what was discussed on slack: it is hard to determine the order/sorting of things in a DataID. What is best, min/max resolution, lower/higher calibration? Having this be explicit rather than the result of some dynamic Enum ordering seems like a more maintainable solution.
This isn't useful if it only works inside the DataID class if the enum also shows up in the |
We can find a way to serialise that doesn't bring along any enum, just the name of the calibration level will be passed (also in the data attrs). Adding flexibility by making the dataid configureable per reader makes the code more abstract in some places indeed. I'm trying to make this as clear as possible with the current code base, but I understand that the code can be harder to follow. Still, I feel it's an improvement over having hardcoded DatasetID keys. Also, having acceptance tests that verify what sift expects from satpy would be useful to ensure that unacceptable changes don't go unnoticed. Edit: remove grumpy comments, sorry about those. |
This is now addressed by passing only the name of the enum to the dataarray attrs and when serializing the dataid.
It is explicit in the reader configs, as we explicitly give the calibration level in order of preference. I don't see how we can make clearer without hardcoding all the use cases.
As mentioned above, this has been addressed in the latest commit. |
Also, just to clarify why I think the enums aren't so problematic here: |
Co-authored-by: Gerrit Holl <gerrit.holl@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned on slack:
- I have NOT read through the code.
- I think it is good to merge. Any additional changes can be moved to future pull requests. I think there may be too much code to review and at this point you are the one defining what "good enough" code is anyway.
We've had multiple people test this PR and I've been using it for a couple days. Everything we've tested, works. Anything we discover from now on will require an additional PR and we are more likely to find them if this is in the master branch.
Great job and thanks for taking the time and effort to get this done. I think this will really open the door for future flexibility in Satpy.
One thing: Could you update the title so someone reading the changelog will get a good idea/summary of what has changed (maybe mention DataID
).
Thanks all for looking at this! |
🎉 Great effort! Probably the biggest PR in satpy's history. |
This is mostly a commit from David Hoese (see 6a380df - "Add support for Satpy 0.23+") which has been updated to match current Satpy master, i.e., to take commit "Refactor scene to privatise some attributes and methods" pytroll/satpy@e8f2218 into account. NOTE: From this commit MTG-SIFT doesn't work with Satpy versions <= 0.22 since it adapts to API changes introduced with the Satpy pull request "Make the metadata keys that uniquely identify a DataArray (DataID) configurable per reader #1088" pytroll/satpy#1088
This is an attempt at making
DatasetID
s dynamic.DatasetID
objects are used in Satpy to index datasets in aScene
object. They are namedtuples that contain a fixed set of properties that we considered a common denominator for all datasets readable by Satpy. However, it has been shown on multiple occasions that it is becoming harder to make all the properties relevant for all the datasets. For example, polarization and level are such properties that only apply to one dataset type each (respectively SAR-C and Geocat). At the same time, more dataset types are coming into Satpy that need new properties, eg SLSTR which has nadir and oblique views that we need to include in theDatasetID
if we want to discriminate the datasets efficiently.To avoid having to extend the set of properties needed to identify all datasets in Satpy, I propose here to make the
DatasetID
objects dynamic in that the set of properties they use to identify datasets can be tuned per reader. At the same time we are still able to compareDatasetID
s from different datasets. Moreover, properties that are irrelevant to a given dataset are now omitted entirely. For example,wavelength
doesn't make sense for alongitude
dataset, and hence will not be a part of theDatasetID
needed to identifylongitude
, even if the reader configuration defines it as a possible property for idenfying datasets.For backwards compatibility, a default
DatasetID
is still available, but hopefully should be removed soon.flake8 satpy
DatasetID
is usedDATASET_KEYS
listDatasetID
creation