Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-34175: Simplify ingest with extended exposure table and related dimensions #415

Merged
merged 1 commit into from Apr 5, 2022

Conversation

PaulPrice
Copy link
Contributor

@PaulPrice PaulPrice commented Mar 24, 2022

When instruments need additional exposure properties and/or
dimensions, they need a way to provide that information to the
RawIngestTask. We already have the ability to subclass the
RawIngestTask (via the --ingest-task command-line argument),
but there was no way for the subtask to specify the additional
information.

Moved some defaults (the ObservationInfo class and the list of
required ObservationInfo fields) into class variables, and added
methods to provide the required registry records. An important
conceptual addition is the idea of 'extra records': records
which are added to the registry in addition to the exposure
record. These can be used to satisfy foreign key constraints
when the exposure relates to dimensions beyond the standard set.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@PaulPrice PaulPrice force-pushed the tickets/DM-34175 branch 4 times, most recently from 205c660 to 06a5551 Compare March 26, 2022 00:26
Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but I'd like to get @timj 's feedback on it well, because I don't think I can channel him well on the question of where per-instrument ingest extension points should generally go.

"""Additional records that must be inserted into the
`~lsst.daf.butler.Registry` prior to ingesting the exposure ``record``
(e.g., to satisfy foreign key constraints), indexed by the dimension name.
"""
Copy link
Member

@TallJimbo TallJimbo Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the structure of this requires that the extra records be one-to-one with exposure. Would it be worthwhile to make this Dict[str, Set[DimensionRecord]] to generalize it to support e.g. exposure+detector records?

Edit: I gather this is actually for records that exposure depends on, and hence these have a one-to-many relationship with exposure in general. If that's the case, we wouldn't want to generalize this, but we might want to have a dict-of-sets type for other dimension records that could be inserted after the exposure record (see separate comment, later).

``makeExposureRecord`` in constructing an exposure record, and whether they
are required.
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to either make this a classmethod that returns a copy of the dictionary, a classmethod that returns a pair of required/optional sets, or a similar pair of frozenset class variables. Having a multiple type as a class variable might tempt a derived class to modify it directly (changing the base class) instead of modifying its own copy.

Looking how it's used, I actually think one of the pair-of-sets variants might be a bit better than the dictionary one.

universe : `DimensionUniverse`
Set of all known dimensions.
**kwargs
Additional field values for this record.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the idea is that most subclasses will just delegate to super while passing kwargs, I'd say so here.

@@ -1078,6 +1136,8 @@ def ingestFiles(
)

try:
for name, record in exposure.extraRecords.items():
self.butler.registry.syncDimensionData(name, record, update=update_exposure_records)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know enough about your use case to guess why you want to insert these dimension records before the exposure records (because your exposure has a foreign key referencing one of the extra dimensions, right?). But I think that's not necessarily the most common extension case. So, I think we should at least document that this is how the extraRecords field works, and maybe rename it to something like "dependencyRecords", leaving room for a "dependentRecords" field that's inserted only after the exposure record is. That doesn't need to happen on this ticket, but I'd like to leave naming room for it to happen later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct about my use case.
Renaming...

@@ -0,0 +1,7 @@
* Made choice of `ObservationInfo` subclass and required properties configurable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is what is happening any longer given the current state of lsst/astro_metadata_translator#54

When instruments need additional exposure properties and/or
dimensions, they need a way to provide that information to the
RawIngestTask. We already have the ability to subclass the
RawIngestTask (via the --ingest-task command-line argument),
but there was no way for the subtask to specify the additional
information.

Moved the list of required ObservationInfo fields into class
variables, and added methods to provide the required registry
records. An important conceptual addition is the idea of 'extra
records': records which are added to the registry in addition
to the exposure record. These can be used to satisfy foreign
key constraints when the exposure relates to dimensions beyond
the standard set.
@PaulPrice PaulPrice merged commit c18b5ea into main Apr 5, 2022
@PaulPrice PaulPrice deleted the tickets/DM-34175 branch April 5, 2022 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants