Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-42435: Migrate DM schemas from ccdVisitId to (visit, detector) #211

Merged
merged 4 commits into from
May 6, 2024

Conversation

parejkoj
Copy link
Contributor

@parejkoj parejkoj commented Apr 26, 2024

(note added by @gpdf 2024-06-28)

This ticket arose from RFC-975 (Switch APDB schema from ccdVisitId to visit and detector) but was extended during the implementation to cover the Source-like DRP tables as well.

@parejkoj parejkoj force-pushed the tickets/DM-42435 branch 5 times, most recently from b72365e to 3beba3a Compare May 1, 2024 17:49
@@ -1,7 +1,7 @@
---
name: "ApdbSchema"
"@id": "#apdbSchema"
version: "0.1.1"
version: "1.0.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 1.0 and not, say, 0.2? What makes this version more or less "pre-release" than the old one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using a sort-of semantic versioning system here, and this is a fully breaking non-backwards compatible change, hence the major number bump. Version 0.1.1 is the first "there are versions", while anything before that is considered 0.1.0.

https://dmtn-269.lsst.io#version-compatibility

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantic versioning does not require that versions 0.x be mutually compatible, so that's no argument either way.

Copy link
Contributor Author

@parejkoj parejkoj May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the system we're using (following the one in daf_butler) in the link above: this is a major number bump, per that definition (@andy-slac can correct me if I'm wrong here).

It's similar to the versioning system for the alert packets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to add that the exact meaning of the minor version number is not so well-defined now. Major and patch roles are very clear as non-compatible and compatible (and change on this PR is clearly non-compatible). Minor version role is defined now as "backward-compatible", but it is not super-clear for now what that means, as it probably implies that something on the client side may be responsible for that compatibility. In Butlerland we define minor version compatibility as "read-compatible", but for APDB that notion probably does not matter at all as APDB's primary client is AP pipelines which are always writing.

Anyways, I think we may eventually figure out minor version meaning, for now I expect changes on schema side that are either non-compatible or fully-compatible.

Additionally, PPDB schema is supposed to reflect APDB schema, so there are probably similar versioning issues on that side, but I'm not ready to discuss those now. And potential schema migration for PPDB will be a separate huge topic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, for the alerts schemas we are using the FORWARD_TRANSITIVE type of the Confluent compatibility model--see https://dmtn-093.lsst.io/

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, few minor comments. While we are at it, could we drop Instrument table from APDB schema? It was never used by dax_apdb and we now have a generic metadata table that should be used to store instrument name.

yml/apdb.yaml Outdated
Comment on lines 2087 to 2088
- name: IDX_DiaSource_detectorVisit
"@id": "#IDX_DiaSource_detectorVisit"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change the name to IDX_DiaSource_visitDetector to reflect the order of the columns in the index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks! I've tried to be consistent about visit,detector wherever I can (e.g. in function signatures), but missed this one.

yml/apdb.yaml Outdated
- "#DiaForcedSource.detector"
- "#DiaForcedSource.visit"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the order to visit, detector. May not be very important, but it's better to keep it consistent with DiaSource. Also it can be useful for queries like SELECT * from DiaForcedSources WHERE visit = 1000. I don't think that query SELECT * from DiaForcedSources WHERE detector = 10 is useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're definitely more likely to search on visit, but I suspect searches on just detector will be useful for debugging, e.g. "does detector 10 always have some kind of failure mode in its sources?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those queries are important/frequent, then maybe worth adding an extra index? I guess we'll need a systematic approach to figure out performance issues later.

yml/apdb.yaml Outdated Show resolved Hide resolved
description: Id of the detector on the instrument that took this visit.
Datatype short instead of byte because of DB concerns about unsigned bytes.
ivoa:ucd: meta.id;obs.image
# TODO: we probably need to add an index on the above two for this table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primary key would be nice to have too. Are there multiple processings for visit/detector? How are they distinguished here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebellm : do we have a plan for multiple processings to go into the same APDB, and if so, how will we distinguish them in DetectorVisitProcessingSummary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can leave this question for the implementation ticket of DetectorVisitProcessingSummary, since we aren't writing anything to it? That would be DM-39497.

Although, bundling as many schema changes now as we can would be good, it might be better to not block on a table we're not yet using.

Copy link
Contributor

@ebellm ebellm May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table is meant to allow multiple processings of the same raws (that's why it's a processing summary), so there should indeed be a key on processing. If we go the direction we're thinking on DM-24858, that could be the "release ID" that us used in the ID generators (0 for AP in PP, 1 or 2 for a reprocessing).

Offline/precursor reprocessings go in separate schemas entirely, of course.

I think it's fine to defer this to DM-39497.

@parejkoj parejkoj merged commit 120a2ea into main May 6, 2024
4 checks passed
@parejkoj parejkoj deleted the tickets/DM-42435 branch May 6, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants