DM-36111: miscellaneous improvements to registry support classes. #731

TallJimbo · 2022-09-06T16:47:25Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

codecov · 2022-09-06T17:08:39Z

Codecov Report

Base: 84.68% // Head: 84.73% // Increases project coverage by +0.05% 🎉

Coverage data is based on head (fd68878) compared to base (4ca570b).
Patch coverage: 86.77% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #731      +/-   ##
==========================================
+ Coverage   84.68%   84.73%   +0.05%     
==========================================
  Files         244      244              
  Lines       32156    32095      -61     
  Branches     6040     6040              
==========================================
- Hits        27230    27196      -34     
+ Misses       3752     3724      -28     
- Partials     1174     1175       +1

Impacted Files	Coverage Δ
python/lsst/daf/butler/registries/remote.py	`0.00% <0.00%> (ø)`
python/lsst/daf/butler/registries/sql.py	`81.08% <ø> (-0.04%)`	⬇️
...hon/lsst/daf/butler/registry/dimensions/caching.py	`94.73% <ø> (ø)`
...on/lsst/daf/butler/registry/dimensions/governor.py	`93.82% <ø> (ø)`
...ython/lsst/daf/butler/registry/dimensions/query.py	`74.24% <ø> (ø)`
...thon/lsst/daf/butler/registry/dimensions/skypix.py	`76.66% <ø> (ø)`
...n/lsst/daf/butler/registry/interfaces/_datasets.py	`75.89% <0.00%> (ø)`
...sst/daf/butler/registry/interfaces/_collections.py	`77.61% <53.33%> (-3.07%)`	⬇️
...n/lsst/daf/butler/registry/databases/postgresql.py	`79.67% <78.57%> (+0.49%)`	⬆️
python/lsst/daf/butler/core/timespan.py	`81.62% <79.24%> (-0.47%)`	⬇️
... and 29 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

andy-slac

Looks good, few minor comments.

andy-slac · 2022-09-12T21:11:59Z

python/lsst/daf/butler/registry/queries/_structs.py

-    imposed by the string expression or data ID
-    (`GovernorDimensionRestriction`).
+    imposed by the string expression and/or data ID
+    (`Mapping` [ ``set`,  `~collections.abc.Set` ] ]).


Double backtick, and set should be str?

andy-slac · 2022-09-12T21:42:53Z

python/lsst/daf/butler/registry/datasets/byDimensions/summaries.py

+            summary.dataset_types.add(datasetType)
            for dimension in self._tables.dimensions:
                value = row[dimension.name]
                if value is not None:
-                    summary.dimensions.add(dimension, value)
+                    summary.governors.setdefault(dimension.name, set()).add(value)


Should be this using summary.add_data_ids() instead of modifying data members directly?

No, I left the data member as public (with a public annotation as a mutable type) because I wasn't bothered by external code doing its own updates to it, and trying to make this block produce data IDs just so they can be unpacked into a different structure didn't seem worthwhile; I prefer the transparency of seeing exactly what's going on over encapsulation in this case.

andy-slac · 2022-09-12T21:47:09Z

python/lsst/daf/butler/registry/queries/expressions/check.py

-                governor = branch.dataIdKey
-                value = summary.governors.setdefault(governor, branch.dataIdValue)
-                if value != branch.dataIdValue:
+            # Test whether this branch has a form like '<dimension>=<value' (or


Angle bracket missing after <value.

andy-slac · 2022-09-12T22:29:27Z

python/lsst/daf/butler/registry/interfaces/_database.py

+
+        Notes
+        -----
+        This is used by `Session.upload` to decide when `constant_rows` can be


I don't see Session.upload, is it in one of your other branches?

It's something I ended up removing, because I found that calling code always wanted to manually inspect this value to decide how whether to make a temporary table or use constant_rows itself (since one involves a resource whose lifetime should be managed, and the other does not). I've dropped this first sentence from the docs and reworded the next accordingly.

andy-slac · 2022-09-12T23:42:39Z

python/lsst/daf/butler/core/timespan.py

+
+        Returns
+        -------
+        timespan


Add : Timespan, optional?

Actually want it to be "Timespan or None" in this case, but yes.

andy-slac · 2022-09-12T23:57:31Z

python/lsst/daf/butler/registry/datasets/byDimensions/_storage.py

@@ -396,6 +397,14 @@ def select(
            # create a dummy subquery that we know will fail.
            tags_query = SimpleQuery()
            tags_query.join(self._tags, **kwargs)
+            # If the timespan is requested or contrained, simulate a


typo: contrained

andy-slac · 2022-09-13T00:08:13Z

python/lsst/daf/butler/registry/interfaces/_database.py

@@ -265,6 +265,31 @@ def dropTemporaryTable(self, table: sqlalchemy.schema.Table) -> None:
        else:
            raise TypeError(f"Table {table.key} was not created by makeTemporaryTable.")

+    @contextmanager
+    def temporary_table(self, spec: ddl.TableSpec, name: Optional[str]) -> Iterator[sqlalchemy.schema.Table]:


I don't see this used anywhere, I guess it's also for one of the other branches?

Correct. An upcoming branch will use it. I suppose I could also use it to simplify the existing [Something]QueryResult.materialize context managers, but I'll be replacing them completely in one of those upcoming branches, too.

andy-slac · 2022-09-13T00:18:15Z

python/lsst/daf/butler/registry/interfaces/_database.py

+        try:
+            yield table
+        finally:
+            self.dropTemporaryTable(table)


I think Coverity tries to say that there is no unit test for the whole temporary_table but somehow it only points to a single line.

It took me a while to figure out why the set-like GovernorDimensionRestriction class kept needing new set-op method variants for ~every context in which it was being used, but the answer is that it was really a mapping of sets (one for each governor dimension), and hence when instances have different dimensions (mapping keys) dimensions, their relationship is not set-like. The best solution seems to be to remove that class in favor of the kind of mapping-of-sets that backed it, and move its convenience functionality that's important for CollectionSummary to CollectionSummary itself. I changed some attribute names and the module name for collecton summary to get them consistent with now-preferred style; it made sense to do this at the same time because I wanted to find all code using CollectionSummary in order to make sure it was updated appropriately, and renames made MyPy very good at that. I also switched from Dimension instance keys to `str` dimension name keys, since I think we'll be doing more of that with RFC-834 adopted. Finally, I've changed the logic in the user-expression checking code to track all dimensions, and then filter down to just the governor dimensions at the boundary with the rest of the query system. This just seemed the best for overall code simplicity.

This is useful for adding types to literals, which don't need to be sized the same way that columns in tables do.

This was previously blocking column names with non-standard SQL characters.

This is enabled by new sphgeom functionality added on DM-34721.

This removes the analogous expression class for spatial regions, as well as the base class they shared. We need an abstraction layer above SQLAlchemy's for timespans, since those can have a representation that involves multiple columns in some databases, and SQLAlchemy's ColumnElement can't handle that. But our regions are always just an opaque blob column, so that abstraction layer wasn't carrying its weight, despite the nice symmetry between regions and timespans in other respects. A new classmethod constructor has been added to FieldSpec to take care of the only important thing this class did - holding the defaults for how region columns should be defined. That results in a number of abstract methods now being first defined in TimespanDatabaseRepresentation. For the most part, these are just transfers from the old base class with slightly adjusted docs, but there are a few small updates: - fromLiteral now accepts None as a way to support NULL expressions; - fromSelectable has been replaced from_columns, since SQLAlchemy is moving towards being much more strict about the difference between the columns of a SELECT statement and the columns of a FromClause, such as a table or an aliased subquery, and in the future we won't be able to access a ".columns" across the board (e.g. on Select objects, it'll be "exported_columns" instead). - we now use tuples instead of generators for a few methods, since generators for 1- or 2-element results involves a lot more overhead with no real gain.

(and TimespanDatabaseRepresentation.overlaps). This makes the Python interface more consistent with the string expression language, which uses OVERLAPS for timespan-time arguments, and it will sidestep a problem with function name resolution in the new daf_relation expression system, since normally that looks in the operator module first and a method second, and "contains" (but not "overlaps") is a member of the operator module.

These allowed the query's internal region to be overridden, but they were a solution in search of a problem.

The timespan for a dataset subquery for a non-CALIBRATION collection is hereby defined to be a fully unbounded timespan, which will let us UNION those subqueries against CALIBRATION-collection queries for the same dataset before doing a temporal join against a dimension like exposure or visit. It would make sense to change the behavior of queryDatasetAssociations to associate non-CALIBRATION collections with a full timespan instead of None at the same time, but that's an API change that I don't want to tackle right now. Eventually I'd like to replace queryDatasetAssocations entirely with a method-chaining interface to get associations from queryDatasets, and I think that RFC wil be the time to tackle this.

This case-based approach is much closer to Andy Salnikov's initial implementation of find-first dataset queries; I think it's likely that changing it was at most unnecessary performance-wise (I was looking for ways to simplify a query the planner was having trouble with, but this construct - despite being hard for humans to parse when reading the SQL - probably wasn't playing a role in that). And some other query system changes have made it preferable from a code-organization standpoint, providing it's no worse from a performance standpoint. To that end, in some quick benchmarks (on PostgreSQL, querying 16 collections with the same dataset types and a spatial constraint to make the overall query nontrivial), this was consistently faster than the old, UNION-based rank expression, but only barely; I'm by no means certain that difference is significant, but I'm at least pretty confident that this is no worse for performance.

This is a very thin wrapper around existing methods to create and drop temporary tables, but it's a convenient one, especially when using contextlib.ExitStack.

We already seem to require SQLAlchemy 1.4, which includes the fix we submitted upstream.

This branch also includes a number of behind-the-scenes changes that I don't think merit a changelog entry.

TallJimbo · 2022-09-20T19:35:55Z

@andy-slac , I'm sending this back to you for another look because Jenkins revealed a problem that led to a non-trivial change, and then I decided to throw in a couple more miscellaneous fixes. In particular:

465ebeb has been modified to define the timespan for a non-CALIBRATION collection to a fully-unbounded one, instead of NULL. That's logically the behavior we want in the query system anyway, and it avoids the need for special IS NULL handling in queries to address that downstream problem Jenkins discovered.
That change led to d6df482, fixing a preexisting bug in how SQLAlchemy converts literals in queries (see also Literals in SELECT columns incorrectly converted to strings sqlalchemy/sqlalchemy#8540).
5aa83fa is a fix for a bug reported on Slack a couple of weeks ago that I was going to put on this branch but forgot about as I disappeared for vacation.
9cb3aec, 38c22a6, and 3909cdf are tiny fixed for things I noticed while I was working through review comments.

andy-slac

Looks good, no new comments.

TallJimbo force-pushed the tickets/DM-36111 branch 3 times, most recently from bb7ba89 to 07a6120 Compare September 6, 2022 17:03

TallJimbo force-pushed the tickets/DM-36111 branch 4 times, most recently from c09da8f to aadd939 Compare September 9, 2022 17:48

TallJimbo marked this pull request as ready for review September 9, 2022 21:04

andy-slac approved these changes Sep 13, 2022

View reviewed changes

TallJimbo force-pushed the tickets/DM-36111 branch 2 times, most recently from dac47f1 to 03d4779 Compare September 19, 2022 16:46

TallJimbo added 9 commits September 19, 2022 12:50

Add __eq__, __hash__, and __repr__ to CollectionRecord.

3efff6b

Make DatasetAssociation immutable and hashable.

ad3f0a5

Allow custom SQLAlchemy types to be initialized with no size.

b8ca84b

This is useful for adding types to literals, which don't need to be sized the same way that columns in tables do.

Add Database interface for using constant rows in queries.

0c75125

Export Database's Session helper class symbol.

7b04dfa

Add quoting to SQLite string-column check constraints.

b22661b

This was previously blocking column names with non-standard SQL characters.

Add missing copyright header.

11ef767

Support region intersections in DataCoordinate.

7732979

This is enabled by new sphgeom functionality added on DM-34721.

TallJimbo force-pushed the tickets/DM-36111 branch from 03d4779 to 4bef3e0 Compare September 19, 2022 16:50

TallJimbo added 5 commits September 20, 2022 11:14

Remove superfluous region arguments in Query methods.

9fb1ebe

These allowed the query's internal region to be overridden, but they were a solution in search of a problem.

Forward correct region from QueryBuilder to Query.

24cc1cc

Add SQL cast to fix from-SQL conversion of timespan literals.

d6df482

TallJimbo force-pushed the tickets/DM-36111 branch 2 times, most recently from 49fc32f to 3212de8 Compare September 20, 2022 15:58

TallJimbo added 8 commits September 20, 2022 12:49

Add temporary table context manager to Session.

2d6731a

This is a very thin wrapper around existing methods to create and drop temporary tables, but it's a convenient one, especially when using contextlib.ExitStack.

Ensure Registry exception tests actually try to execute a query.

dc00c78

Remove workaround for SQLAlchemy bug that has been fixed.

9cb3aec

We already seem to require SQLAlchemy 1.4, which includes the fix we submitted upstream.

Fix variable-shadowing bug in dimension export kwarg-handling.

5aa83fa

Fix doc typo.

38c22a6

Fix doc return-type references.

3909cdf

Add changelog entries for publicly-visible changes.

fd68878

This branch also includes a number of behind-the-scenes changes that I don't think merit a changelog entry.

TallJimbo force-pushed the tickets/DM-36111 branch from 3212de8 to fd68878 Compare September 20, 2022 16:49

andy-slac approved these changes Sep 20, 2022

View reviewed changes

TallJimbo merged commit 1a6169c into main Sep 21, 2022

TallJimbo deleted the tickets/DM-36111 branch September 21, 2022 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-36111: miscellaneous improvements to registry support classes. #731

DM-36111: miscellaneous improvements to registry support classes. #731

TallJimbo commented Sep 6, 2022 •

edited

codecov bot commented Sep 6, 2022 •

edited

andy-slac left a comment

andy-slac Sep 12, 2022

andy-slac Sep 12, 2022

TallJimbo Sep 19, 2022

andy-slac Sep 12, 2022

andy-slac Sep 12, 2022

TallJimbo Sep 19, 2022

andy-slac Sep 12, 2022

TallJimbo Sep 19, 2022

andy-slac Sep 12, 2022

andy-slac Sep 13, 2022

TallJimbo Sep 19, 2022

andy-slac Sep 13, 2022

TallJimbo commented Sep 20, 2022

andy-slac left a comment

DM-36111: miscellaneous improvements to registry support classes. #731

DM-36111: miscellaneous improvements to registry support classes. #731

Conversation

TallJimbo commented Sep 6, 2022 • edited

Checklist

codecov bot commented Sep 6, 2022 • edited

Codecov Report

andy-slac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo commented Sep 20, 2022

andy-slac left a comment

Choose a reason for hiding this comment

TallJimbo commented Sep 6, 2022 •

edited

codecov bot commented Sep 6, 2022 •

edited