Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-23778: Update to support obs_lsst 2to3 conversion #210

Merged
merged 8 commits into from Mar 23, 2020
Merged

Conversation

timj
Copy link
Member

@timj timj commented Mar 20, 2020

No description provided.

@@ -459,7 +464,12 @@ def run(self, root: str, collections: List[str], *,
# DM-21246 should let us fix this, assuming we actually want to keep
# the transaction open that long.
if self.config.doRegisterInstrument:
self.instrument.register(self.registry)
# Allow registration to fail on the assumption that this means
Copy link
Member Author

@timj timj Mar 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should be a bit cleverer here and run queryDimension and see if the instrument has been registered already. (I see there is a config option to skip this but I think we can probably remove the option completely).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd vote to defer this until we can use Database.sync to do it via a new Registry method (see other comment on this topic); just doing queryDimensions yourself would be subject to race conditions, and while that probably isn't a big deal now, it means it definitely wouldn't be future-proof.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had implemented the queryDimension fix already. I think I'll leave it in for now since for gen2to3 the primary use case is a one to one conversion. Anyone converting multiple gen2 to one gen3 will likely be rare, especially doing it all in bulk simultaneously.

@@ -503,7 +503,10 @@ def run(self, root: str, collections: List[str], *,
# to convert, because Gen2 alone doesn't know enough about the
# relationships between data IDs.
for converter in converters:
converter.insertDimensionData()
try:
converter.insertDimensionData()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a registry.insertDimensionData() option for doing slow inserts where it calls queryDimensionData for each record first and complains if there is a difference? Then only inserting the ones that don't already exist?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I was already planning to add one on one of the "fix ingest" tickets. We already have the underlying primitive in Database.sync, but we may need to think about how to identify which fields to use for lookup, which fields to compare, and which fields to just set when inserting. We may already have all of the information we need for that in the dimension definitions, but I'm not certain.

The big catch with this is that each sync needs to start its own transaction, and hence can't be nested inside another transaction. AFAICT, there is no generic-SQL way to implement it in a concurrent and transaction-nesting-friendly way.

# a new instance version. Without this each subclass would add
# to the same set.
if not self.collections:
self.collections = set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this feels like an strong indication that the base class should be using abstractmethod instead of relying on class attributes as customization hooks. I'm not sure fixing that is in scope for this ticket, but perhaps we should start doing that for the new detectorKey and exposureKey hooks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The detector key and exposure key are fine because they are string constants. I probably should change all the lists to read only empty tuples and maybe change collections to a frozenset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to like the extra checks and clarity of abstractmethod better anyway, but I certainly agree that the non-constant values are the dangerous ones.

And on that note, I think this problem is now breaking the obs_decam tests on the DM-21849 branch (https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/31380/tests). If you're about ready to merge this ticket, I'm happy to just wait and rebase, or to take care of making these all immutable (or abstractmethod-based) on DM-21849 myself. It just doesn't make sense for both of us to do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could merge pretty quickly. I think the curated calibrations discussion is not technically blocking this since I haven't yet moved the code to obs_base for that. I can take a look on Sunday at your other comments.

@timj timj force-pushed the tickets/DM-23778 branch 2 times, most recently from c4d0734 to 6023e73 Compare March 23, 2020 01:14
@timj timj merged commit 99824a6 into master Mar 23, 2020
@timj timj deleted the tickets/DM-23778 branch March 23, 2020 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants