New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-34175: Simplify ingest with extended exposure table and related dimensions #415
Conversation
205c660
to
06a5551
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but I'd like to get @timj 's feedback on it well, because I don't think I can channel him well on the question of where per-instrument ingest extension points should generally go.
"""Additional records that must be inserted into the | ||
`~lsst.daf.butler.Registry` prior to ingesting the exposure ``record`` | ||
(e.g., to satisfy foreign key constraints), indexed by the dimension name. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect the structure of this requires that the extra records be one-to-one with exposure
. Would it be worthwhile to make this Dict[str, Set[DimensionRecord]]
to generalize it to support e.g. exposure
+detector
records?
Edit: I gather this is actually for records that exposure
depends on, and hence these have a one-to-many relationship with exposure in general. If that's the case, we wouldn't want to generalize this, but we might want to have a dict-of-sets type for other dimension records that could be inserted after the exposure
record (see separate comment, later).
python/lsst/obs/base/ingest.py
Outdated
``makeExposureRecord`` in constructing an exposure record, and whether they | ||
are required. | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to either make this a classmethod
that returns a copy of the dictionary, a classmethod
that returns a pair of required/optional sets, or a similar pair of frozenset
class variables. Having a multiple type as a class variable might tempt a derived class to modify it directly (changing the base class) instead of modifying its own copy.
Looking how it's used, I actually think one of the pair-of-sets variants might be a bit better than the dictionary one.
universe : `DimensionUniverse` | ||
Set of all known dimensions. | ||
**kwargs | ||
Additional field values for this record. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the idea is that most subclasses will just delegate to super
while passing kwargs, I'd say so here.
@@ -1078,6 +1136,8 @@ def ingestFiles( | |||
) | |||
|
|||
try: | |||
for name, record in exposure.extraRecords.items(): | |||
self.butler.registry.syncDimensionData(name, record, update=update_exposure_records) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I know enough about your use case to guess why you want to insert these dimension records before the exposure records (because your exposure
has a foreign key referencing one of the extra dimensions, right?). But I think that's not necessarily the most common extension case. So, I think we should at least document that this is how the extraRecords
field works, and maybe rename it to something like "dependencyRecords", leaving room for a "dependentRecords" field that's inserted only after the exposure
record is. That doesn't need to happen on this ticket, but I'd like to leave naming room for it to happen later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct about my use case.
Renaming...
doc/changes/DM-34175.feature.rst
Outdated
@@ -0,0 +1,7 @@ | |||
* Made choice of `ObservationInfo` subclass and required properties configurable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is what is happening any longer given the current state of lsst/astro_metadata_translator#54
When instruments need additional exposure properties and/or dimensions, they need a way to provide that information to the RawIngestTask. We already have the ability to subclass the RawIngestTask (via the --ingest-task command-line argument), but there was no way for the subtask to specify the additional information. Moved the list of required ObservationInfo fields into class variables, and added methods to provide the required registry records. An important conceptual addition is the idea of 'extra records': records which are added to the registry in addition to the exposure record. These can be used to satisfy foreign key constraints when the exposure relates to dimensions beyond the standard set.
cce360a
to
8ffaabf
Compare
When instruments need additional exposure properties and/or
dimensions, they need a way to provide that information to the
RawIngestTask. We already have the ability to subclass the
RawIngestTask (via the --ingest-task command-line argument),
but there was no way for the subtask to specify the additional
information.
Moved some defaults (the ObservationInfo class and the list of
required ObservationInfo fields) into class variables, and added
methods to provide the required registry records. An important
conceptual addition is the idea of 'extra records': records
which are added to the registry in addition to the exposure
record. These can be used to satisfy foreign key constraints
when the exposure relates to dimensions beyond the standard set.
Checklist
doc/changes