Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-36312: Deprecate support for component datasets in Registry #736

Merged
merged 5 commits into from Oct 5, 2022

Conversation

TallJimbo
Copy link
Member

@TallJimbo TallJimbo commented Sep 21, 2022

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented Sep 21, 2022

Codecov Report

Base: 84.75% // Head: 84.74% // Decreases project coverage by -0.00% ⚠️

Coverage data is based on head (c7f82e6) compared to base (738e63c).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #736      +/-   ##
==========================================
- Coverage   84.75%   84.74%   -0.01%     
==========================================
  Files         254      254              
  Lines       32946    32960      +14     
  Branches     5632     5639       +7     
==========================================
+ Hits        27922    27933      +11     
- Misses       3801     3802       +1     
- Partials     1223     1225       +2     
Impacted Files Coverage Δ
python/lsst/daf/butler/registry/_registry.py 72.82% <ø> (ø)
...lsst/daf/butler/registry/queries/_query_backend.py 64.10% <ø> (ø)
tests/test_simpleButler.py 97.81% <ø> (-0.11%) ⬇️
python/lsst/daf/butler/registries/sql.py 81.59% <100.00%> (+0.29%) ⬆️
python/lsst/daf/butler/registry/_exceptions.py 100.00% <100.00%> (ø)
.../butler/registry/datasets/byDimensions/_manager.py 90.09% <100.00%> (-0.86%) ⬇️
...n/lsst/daf/butler/registry/interfaces/_datasets.py 76.31% <100.00%> (+0.20%) ⬆️
...ython/lsst/daf/butler/registry/queries/_results.py 86.08% <100.00%> (ø)
python/lsst/daf/butler/registry/tests/_registry.py 98.95% <100.00%> (+0.01%) ⬆️
tests/test_butler.py 97.67% <100.00%> (+<0.01%) ⬆️
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@TallJimbo TallJimbo force-pushed the tickets/DM-36312 branch 2 times, most recently from 791db09 to 3335bb4 Compare September 22, 2022 17:57
@TallJimbo TallJimbo force-pushed the tickets/DM-36313 branch 2 times, most recently from f306dac to 82fa9c0 Compare September 22, 2022 19:16
@TallJimbo TallJimbo force-pushed the tickets/DM-36313 branch 2 times, most recently from fc9f3f2 to c88edb3 Compare September 26, 2022 17:49
@TallJimbo TallJimbo force-pushed the tickets/DM-36313 branch 2 times, most recently from 66f318f to 82d223c Compare September 28, 2022 17:36
@TallJimbo TallJimbo force-pushed the tickets/DM-36312 branch 3 times, most recently from 4b57dd7 to 41ae776 Compare October 2, 2022 14:16
Base automatically changed from tickets/DM-36313 to main October 3, 2022 19:30
@TallJimbo TallJimbo marked this pull request as ready for review October 4, 2022 14:40
@TallJimbo TallJimbo requested a review from timj October 4, 2022 14:43
Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

else:
storage = self._managers.datasets[datasetType]
parent_name, component = DatasetType.splitDatasetTypeName(datasetType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame this logic is duplicated from getDatasetType but it looks like you need storage.find to work. How about:

if not isinstance(datasetType, DatasetType):  # or is a str
    registryDatasetType = self.getDatasetType(datasetType)
else:
    registryDatasetType = datasetType
parent_name, component = registryDatasetType.nameAndComponent()
storage = self._managers.datasets[parent_name]

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See next comment; thanks to that, duplication is much reduced.

storage = self._managers.datasets[parent_name]
datasetType = storage.datasetType
if component is not None:
datasetType.makeComponentDatasetType(component)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line seems like we should be capturing the return value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! We don't actually use datasetType after this, just storage.datasetType and component. So I can remove this check, and the line above it, and I can move the storage = line outside the if. And then there's actually very little duplicated between this method and getDatasetType, addressing your other comment.

That does mean that findDataset does not respect a storage class override that's passed to it, but that seems to have been the behavior before (that's handled in Butler instead). As it turns out, the Registry.query methods do respect storage class overrides, but that was largely an accident. I'm formalizing that on DM-31725, and it'll be easier to make findDataset respect overrides then, too.

deprecation_message = (
"Querying for component datasets via Registry query methods is deprecated in favor of using "
"DatasetRef and DatasetType methods on parent datasets. Only components=False will be supported "
"after v26."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe clarify that the parameter will be removed completely after v27 (which is a long time from now). We will need a jira ticket linked to a release somehow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created DM-36457 for the post-v27 removals, but haven't tried to create release tickets for v26 or v27 (neither exist yet).

This test exports component datasets and then checks via the parent
dataset type, since this is the only valid way to handle component
exports (since Registry cannot record just a component without a
parent, even if Datastore can).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm effectively disabling the ability of datastore to store a bare component in #737.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does that leave this test? It seems to still be working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the test in datastore, although I didn't make datastore.put reject a component dataset type. Maybe that was a mistake, I hadn't realized we had other tests that did disassembly outside datastore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like I should probably keep it around for now, then, until you get a chance to figure out the code paths involved and disable them more fully?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused what is happening in this test since registry doesn't do composite disassembly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it looks like the doctring had just gotten out of date, and I took it at face-value when I updated it. It just exports the composite and checks that the components are accessible after import. But this ticket makes that check for components impossible to do directly, and doing it indirectly via the parent datasets just makes this test the same as some others, so I'll delete it.

@TallJimbo
Copy link
Member Author

TallJimbo commented Oct 4, 2022

@timj , I've addressed all review comments (in fixup commits for now), and added a new commit for the problem @kfindeisen reported on Slack. Can you take a look at that?

findDataset and getDatasetType are now the only registry methods that
support components; removing support from those would cause a lot of
inconvenience downstream, and the complexity cost in Registry for
keeping it there is quite low. But we want to get it out of the
manager classes to get rid of that unpleasant copy.copy on the storage
object, and to keep component support from accidentally spreading to
other places where we don't want to maintain it.

At the same time, I've changed the exception raised for missing
dataset types in the manager to be MissingDatasetTypeError for
consistency with the query methods, but adding KeyError as a base
class of that exception to maintain backwards compatibility.
Unlike queryDataIds and queryDimensionRecords, one missing dataset type
here should not doom the entire query.
Originally this test appeared to test that component datasets could be
exported directly; that's what the docstring still said prior to this
commit, though it was actually just exporting the parent datasets and
querying for the component datasets on import.  Now querying for the
components is deprecated, so this test is redundant with a few others
that already check basic export-import round-trip.
@@ -2666,6 +2666,16 @@ def testQueryResultSummaries(self):
self.assertEqual(query5.count(exact=True), 0)
messages = list(query5.explain_no_results())
self.assertFalse(messages)
# This query should yield results from one dataset type but not the
# other, which is not registered.
query5 = registry.queryDatasets(["bias", "nonexistent"], collections=["biases"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

@TallJimbo TallJimbo merged commit a21ee8f into main Oct 5, 2022
@TallJimbo TallJimbo deleted the tickets/DM-36312 branch October 5, 2022 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants