-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-42435: Migrate DM schemas from ccdVisitId to (visit, detector) #211
Conversation
b72365e
to
3beba3a
Compare
@@ -1,7 +1,7 @@ | |||
--- | |||
name: "ApdbSchema" | |||
"@id": "#apdbSchema" | |||
version: "0.1.1" | |||
version: "1.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be 1.0 and not, say, 0.2? What makes this version more or less "pre-release" than the old one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using a sort-of semantic versioning system here, and this is a fully breaking non-backwards compatible change, hence the major number bump. Version 0.1.1 is the first "there are versions", while anything before that is considered 0.1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantic versioning does not require that versions 0.x be mutually compatible, so that's no argument either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the system we're using (following the one in daf_butler) in the link above: this is a major number bump, per that definition (@andy-slac can correct me if I'm wrong here).
It's similar to the versioning system for the alert packets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to add that the exact meaning of the minor version number is not so well-defined now. Major and patch roles are very clear as non-compatible and compatible (and change on this PR is clearly non-compatible). Minor version role is defined now as "backward-compatible", but it is not super-clear for now what that means, as it probably implies that something on the client side may be responsible for that compatibility. In Butlerland we define minor version compatibility as "read-compatible", but for APDB that notion probably does not matter at all as APDB's primary client is AP pipelines which are always writing.
Anyways, I think we may eventually figure out minor version meaning, for now I expect changes on schema side that are either non-compatible or fully-compatible.
Additionally, PPDB schema is supposed to reflect APDB schema, so there are probably similar versioning issues on that side, but I'm not ready to discuss those now. And potential schema migration for PPDB will be a separate huge topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, for the alerts schemas we are using the FORWARD_TRANSITIVE type of the Confluent compatibility model--see https://dmtn-093.lsst.io/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, few minor comments. While we are at it, could we drop Instrument
table from APDB schema? It was never used by dax_apdb and we now have a generic metadata table that should be used to store instrument name.
yml/apdb.yaml
Outdated
- name: IDX_DiaSource_detectorVisit | ||
"@id": "#IDX_DiaSource_detectorVisit" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change the name to IDX_DiaSource_visitDetector
to reflect the order of the columns in the index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks! I've tried to be consistent about visit,detector wherever I can (e.g. in function signatures), but missed this one.
yml/apdb.yaml
Outdated
- "#DiaForcedSource.detector" | ||
- "#DiaForcedSource.visit" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the order to visit, detector. May not be very important, but it's better to keep it consistent with DiaSource. Also it can be useful for queries like SELECT * from DiaForcedSources WHERE visit = 1000
. I don't think that query SELECT * from DiaForcedSources WHERE detector = 10
is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're definitely more likely to search on visit, but I suspect searches on just detector will be useful for debugging, e.g. "does detector 10 always have some kind of failure mode in its sources?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If those queries are important/frequent, then maybe worth adding an extra index? I guess we'll need a systematic approach to figure out performance issues later.
description: Id of the detector on the instrument that took this visit. | ||
Datatype short instead of byte because of DB concerns about unsigned bytes. | ||
ivoa:ucd: meta.id;obs.image | ||
# TODO: we probably need to add an index on the above two for this table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primary key would be nice to have too. Are there multiple processings for visit/detector? How are they distinguished here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ebellm : do we have a plan for multiple processings to go into the same APDB, and if so, how will we distinguish them in DetectorVisitProcessingSummary
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can leave this question for the implementation ticket of DetectorVisitProcessingSummary
, since we aren't writing anything to it? That would be DM-39497.
Although, bundling as many schema changes now as we can would be good, it might be better to not block on a table we're not yet using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table is meant to allow multiple processings of the same raws (that's why it's a processing summary), so there should indeed be a key on processing. If we go the direction we're thinking on DM-24858, that could be the "release ID" that us used in the ID generators (0 for AP in PP, 1 or 2 for a reprocessing).
Offline/precursor reprocessings go in separate schemas entirely, of course.
I think it's fine to defer this to DM-39497.
This functionality will be implemented in the metadata table.
(note added by @gpdf 2024-06-28)
This ticket arose from RFC-975 (Switch APDB schema from ccdVisitId to visit and detector) but was extended during the implementation to cover the Source-like DRP tables as well.