New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-36111: miscellaneous improvements to registry support classes. #731
Conversation
bb7ba89
to
07a6120
Compare
Codecov ReportBase: 84.68% // Head: 84.73% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #731 +/- ##
==========================================
+ Coverage 84.68% 84.73% +0.05%
==========================================
Files 244 244
Lines 32156 32095 -61
Branches 6040 6040
==========================================
- Hits 27230 27196 -34
+ Misses 3752 3724 -28
- Partials 1174 1175 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
c09da8f
to
aadd939
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, few minor comments.
imposed by the string expression or data ID | ||
(`GovernorDimensionRestriction`). | ||
imposed by the string expression and/or data ID | ||
(`Mapping` [ ``set`, `~collections.abc.Set` ] ]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double backtick, and set
should be str
?
summary.dataset_types.add(datasetType) | ||
for dimension in self._tables.dimensions: | ||
value = row[dimension.name] | ||
if value is not None: | ||
summary.dimensions.add(dimension, value) | ||
summary.governors.setdefault(dimension.name, set()).add(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be this using summary.add_data_ids()
instead of modifying data members directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I left the data member as public (with a public annotation as a mutable type) because I wasn't bothered by external code doing its own updates to it, and trying to make this block produce data IDs just so they can be unpacked into a different structure didn't seem worthwhile; I prefer the transparency of seeing exactly what's going on over encapsulation in this case.
governor = branch.dataIdKey | ||
value = summary.governors.setdefault(governor, branch.dataIdValue) | ||
if value != branch.dataIdValue: | ||
# Test whether this branch has a form like '<dimension>=<value' (or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Angle bracket missing after <value
.
|
||
Notes | ||
----- | ||
This is used by `Session.upload` to decide when `constant_rows` can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see Session.upload
, is it in one of your other branches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's something I ended up removing, because I found that calling code always wanted to manually inspect this value to decide how whether to make a temporary table or use constant_rows
itself (since one involves a resource whose lifetime should be managed, and the other does not). I've dropped this first sentence from the docs and reworded the next accordingly.
|
||
Returns | ||
------- | ||
timespan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add : Timespan, optional
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually want it to be "Timespan
or None
" in this case, but yes.
@@ -396,6 +397,14 @@ def select( | |||
# create a dummy subquery that we know will fail. | |||
tags_query = SimpleQuery() | |||
tags_query.join(self._tags, **kwargs) | |||
# If the timespan is requested or contrained, simulate a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: contrained
@@ -265,6 +265,31 @@ def dropTemporaryTable(self, table: sqlalchemy.schema.Table) -> None: | |||
else: | |||
raise TypeError(f"Table {table.key} was not created by makeTemporaryTable.") | |||
|
|||
@contextmanager | |||
def temporary_table(self, spec: ddl.TableSpec, name: Optional[str]) -> Iterator[sqlalchemy.schema.Table]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this used anywhere, I guess it's also for one of the other branches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. An upcoming branch will use it. I suppose I could also use it to simplify the existing [Something]QueryResult.materialize
context managers, but I'll be replacing them completely in one of those upcoming branches, too.
try: | ||
yield table | ||
finally: | ||
self.dropTemporaryTable(table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Coverity tries to say that there is no unit test for the whole temporary_table
but somehow it only points to a single line.
dac47f1
to
03d4779
Compare
It took me a while to figure out why the set-like GovernorDimensionRestriction class kept needing new set-op method variants for ~every context in which it was being used, but the answer is that it was really a mapping of sets (one for each governor dimension), and hence when instances have different dimensions (mapping keys) dimensions, their relationship is not set-like. The best solution seems to be to remove that class in favor of the kind of mapping-of-sets that backed it, and move its convenience functionality that's important for CollectionSummary to CollectionSummary itself. I changed some attribute names and the module name for collecton summary to get them consistent with now-preferred style; it made sense to do this at the same time because I wanted to find all code using CollectionSummary in order to make sure it was updated appropriately, and renames made MyPy very good at that. I also switched from Dimension instance keys to `str` dimension name keys, since I think we'll be doing more of that with RFC-834 adopted. Finally, I've changed the logic in the user-expression checking code to track all dimensions, and then filter down to just the governor dimensions at the boundary with the rest of the query system. This just seemed the best for overall code simplicity.
This is useful for adding types to literals, which don't need to be sized the same way that columns in tables do.
This was previously blocking column names with non-standard SQL characters.
This is enabled by new sphgeom functionality added on DM-34721.
03d4779
to
4bef3e0
Compare
This removes the analogous expression class for spatial regions, as well as the base class they shared. We need an abstraction layer above SQLAlchemy's for timespans, since those can have a representation that involves multiple columns in some databases, and SQLAlchemy's ColumnElement can't handle that. But our regions are always just an opaque blob column, so that abstraction layer wasn't carrying its weight, despite the nice symmetry between regions and timespans in other respects. A new classmethod constructor has been added to FieldSpec to take care of the only important thing this class did - holding the defaults for how region columns should be defined. That results in a number of abstract methods now being first defined in TimespanDatabaseRepresentation. For the most part, these are just transfers from the old base class with slightly adjusted docs, but there are a few small updates: - fromLiteral now accepts None as a way to support NULL expressions; - fromSelectable has been replaced from_columns, since SQLAlchemy is moving towards being much more strict about the difference between the columns of a SELECT statement and the columns of a FromClause, such as a table or an aliased subquery, and in the future we won't be able to access a ".columns" across the board (e.g. on Select objects, it'll be "exported_columns" instead). - we now use tuples instead of generators for a few methods, since generators for 1- or 2-element results involves a lot more overhead with no real gain.
(and TimespanDatabaseRepresentation.overlaps). This makes the Python interface more consistent with the string expression language, which uses OVERLAPS for timespan-time arguments, and it will sidestep a problem with function name resolution in the new daf_relation expression system, since normally that looks in the operator module first and a method second, and "contains" (but not "overlaps") is a member of the operator module.
These allowed the query's internal region to be overridden, but they were a solution in search of a problem.
49fc32f
to
3212de8
Compare
The timespan for a dataset subquery for a non-CALIBRATION collection is hereby defined to be a fully unbounded timespan, which will let us UNION those subqueries against CALIBRATION-collection queries for the same dataset before doing a temporal join against a dimension like exposure or visit. It would make sense to change the behavior of queryDatasetAssociations to associate non-CALIBRATION collections with a full timespan instead of None at the same time, but that's an API change that I don't want to tackle right now. Eventually I'd like to replace queryDatasetAssocations entirely with a method-chaining interface to get associations from queryDatasets, and I think that RFC wil be the time to tackle this.
This case-based approach is much closer to Andy Salnikov's initial implementation of find-first dataset queries; I think it's likely that changing it was at most unnecessary performance-wise (I was looking for ways to simplify a query the planner was having trouble with, but this construct - despite being hard for humans to parse when reading the SQL - probably wasn't playing a role in that). And some other query system changes have made it preferable from a code-organization standpoint, providing it's no worse from a performance standpoint. To that end, in some quick benchmarks (on PostgreSQL, querying 16 collections with the same dataset types and a spatial constraint to make the overall query nontrivial), this was consistently faster than the old, UNION-based rank expression, but only barely; I'm by no means certain that difference is significant, but I'm at least pretty confident that this is no worse for performance.
This is a very thin wrapper around existing methods to create and drop temporary tables, but it's a convenient one, especially when using contextlib.ExitStack.
We already seem to require SQLAlchemy 1.4, which includes the fix we submitted upstream.
This branch also includes a number of behind-the-scenes changes that I don't think merit a changelog entry.
3212de8
to
fd68878
Compare
@andy-slac , I'm sending this back to you for another look because Jenkins revealed a problem that led to a non-trivial change, and then I decided to throw in a couple more miscellaneous fixes. In particular:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, no new comments.
Checklist
doc/changes