New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-23778: Update to support obs_lsst 2to3 conversion #210
Conversation
@@ -459,7 +464,12 @@ def run(self, root: str, collections: List[str], *, | |||
# DM-21246 should let us fix this, assuming we actually want to keep | |||
# the transaction open that long. | |||
if self.config.doRegisterInstrument: | |||
self.instrument.register(self.registry) | |||
# Allow registration to fail on the assumption that this means |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should be a bit cleverer here and run queryDimension and see if the instrument has been registered already. (I see there is a config option to skip this but I think we can probably remove the option completely).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd vote to defer this until we can use Database.sync
to do it via a new Registry
method (see other comment on this topic); just doing queryDimensions
yourself would be subject to race conditions, and while that probably isn't a big deal now, it means it definitely wouldn't be future-proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had implemented the queryDimension fix already. I think I'll leave it in for now since for gen2to3 the primary use case is a one to one conversion. Anyone converting multiple gen2 to one gen3 will likely be rare, especially doing it all in bulk simultaneously.
@@ -503,7 +503,10 @@ def run(self, root: str, collections: List[str], *, | |||
# to convert, because Gen2 alone doesn't know enough about the | |||
# relationships between data IDs. | |||
for converter in converters: | |||
converter.insertDimensionData() | |||
try: | |||
converter.insertDimensionData() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a registry.insertDimensionData()
option for doing slow inserts where it calls queryDimensionData for each record first and complains if there is a difference? Then only inserting the ones that don't already exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I was already planning to add one on one of the "fix ingest" tickets. We already have the underlying primitive in Database.sync
, but we may need to think about how to identify which fields to use for lookup, which fields to compare, and which fields to just set when inserting. We may already have all of the information we need for that in the dimension definitions, but I'm not certain.
The big catch with this is that each sync
needs to start its own transaction, and hence can't be nested inside another transaction. AFAICT, there is no generic-SQL way to implement it in a concurrent and transaction-nesting-friendly way.
# a new instance version. Without this each subclass would add | ||
# to the same set. | ||
if not self.collections: | ||
self.collections = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this feels like an strong indication that the base class should be using abstractmethod
instead of relying on class attributes as customization hooks. I'm not sure fixing that is in scope for this ticket, but perhaps we should start doing that for the new detectorKey
and exposureKey
hooks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The detector key and exposure key are fine because they are string constants. I probably should change all the lists to read only empty tuples and maybe change collections to a frozenset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to like the extra checks and clarity of abstractmethod
better anyway, but I certainly agree that the non-constant values are the dangerous ones.
And on that note, I think this problem is now breaking the obs_decam
tests on the DM-21849 branch (https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/31380/tests). If you're about ready to merge this ticket, I'm happy to just wait and rebase, or to take care of making these all immutable (or abstractmethod
-based) on DM-21849 myself. It just doesn't make sense for both of us to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could merge pretty quickly. I think the curated calibrations discussion is not technically blocking this since I haven't yet moved the code to obs_base for that. I can take a look on Sunday at your other comments.
c4d0734
to
6023e73
Compare
The collections default was a class variable but every time setUp was called the instrument would be added to it. Now if it's empty we create a new set per test.
We hope for now that these simply mean we are reusing a gen3 repo.
No description provided.