Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-41158: (mostly) implement QueryDriver for DirectButler #915

Merged
merged 23 commits into from
Apr 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
61f438d
Move NameShrinker out of registry and add container methods.
TallJimbo Feb 26, 2024
6ebc934
Accept NameShrinker in ColumnSpec.to_sql_spec.
TallJimbo Feb 26, 2024
2504bcf
Guard against long field names in dimensions manager.
TallJimbo Feb 26, 2024
ece2431
Default QueryTree argument to Query constructor.
TallJimbo Mar 15, 2024
ec57484
Move 'dimensions' attribute out of QueryBase/Query.
TallJimbo Mar 19, 2024
3c04617
Fix truncated docstring in queries.tree.ColumnSet.
TallJimbo Mar 19, 2024
53092e0
Make queries.ExpressionFactory friendlier to type checkers.
TallJimbo Mar 19, 2024
08639c7
Fix doc typo that inverted meaning in Database interface.
TallJimbo Mar 19, 2024
c73556e
Guard against skypix regions in result specs.
TallJimbo Mar 21, 2024
0015626
Export InvalidQueryError at package scope.
TallJimbo Mar 27, 2024
2546a02
Implement QueryDriver for DirectButler.
TallJimbo Jan 31, 2024
b76888c
Implement Butler._query.
TallJimbo Mar 15, 2024
722ae4c
Fix typo in comment.
TallJimbo Mar 28, 2024
1c3729e
Don't lift 'queries' symbols into package scope (yet).
TallJimbo Mar 28, 2024
b29f78a
Fix incorrect query-result type imports.
TallJimbo Mar 28, 2024
2ca5451
Reduce duplication in query result processing.
TallJimbo Apr 4, 2024
3852316
Open caching context in query context.
TallJimbo Apr 4, 2024
b8d4454
Fix incorrect code comment about query transactions.
TallJimbo Apr 4, 2024
f045ba3
Use dataclass for DirectQueryDriver materialization state.
TallJimbo Apr 4, 2024
98fe0c7
Drop redundant cursor close.
TallJimbo Apr 4, 2024
e4b6b13
Defer guards on QueryBuilder distinct vs. group_by to make them useful.
TallJimbo Apr 4, 2024
f5a5b56
Assert on rather than cast SqlColumnVisitor types.
TallJimbo Apr 4, 2024
8a03dd5
Add test for DirectQueryDriver with PostgreSQL.
TallJimbo Apr 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 4 additions & 4 deletions python/lsst/daf/butler/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,9 @@

# Do not import or lift symbols from mapping_factory and persistence_content,
# as those are internal.
# Only lift 'Progress' from 'progess'; the module is imported as-is above
# Only lift 'Progress' from 'progress'; the module is imported as-is above
from .progress import Progress

# Only import the main public symbols from queries
from .queries import DataCoordinateQueryResults, DatasetRefQueryResults, DimensionRecordQueryResults, Query

# Do not import or lift symbols from 'server' or 'server_models'.
# Import the registry subpackage directly for other symbols.
from .registry import (
Expand All @@ -93,6 +90,9 @@
from .transfers import RepoExportContext, YamlRepoExportBackend, YamlRepoImportBackend
from .version import *

# Do not import or lift symbols from 'queries' until they are public.


# Do not import or lift symbols from 'repo_relocation'.

# Do not export the utility routines from utils.
19 changes: 13 additions & 6 deletions python/lsst/daf/butler/column_spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@

from . import arrow_utils, ddl
from ._timespan import Timespan
from .name_shrinker import NameShrinker

ColumnType: TypeAlias = Literal[
"int", "string", "hash", "float", "datetime", "bool", "uuid", "timespan", "region"
Expand All @@ -79,11 +80,14 @@ class _BaseColumnSpec(pydantic.BaseModel, ABC):
description="Whether the column may be ``NULL``.",
)

def to_sql_spec(self, **kwargs: Any) -> ddl.FieldSpec:
def to_sql_spec(self, name_shrinker: NameShrinker | None = None, **kwargs: Any) -> ddl.FieldSpec:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this nameshrinker argument be non-optional? I would think that if it is ever needed, it will always be needed... otherwise different parts of the code can think names are different. Just creates the potential for certain columns to work with some types of queries/operations but not others.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to do that at a higher level, because the set of names that need shrinking is limited to only those constructed from dataset type names, and that keeps name-shrinker arguments from sprouting up in many more places.

"""Convert this specification to a SQL-specific one.

Parameters
----------
name_shrinker : `NameShrinker`, optional
Object that should be used to shrink the field name to ensure it
fits within database-specific limits.
**kwargs
Forwarded to `ddl.FieldSpec`.

Expand All @@ -92,7 +96,10 @@ def to_sql_spec(self, **kwargs: Any) -> ddl.FieldSpec:
sql_spec : `ddl.FieldSpec`
A SQL-specific version of this specification.
"""
return ddl.FieldSpec(name=self.name, dtype=ddl.VALID_CONFIG_COLUMN_TYPES[self.type], **kwargs)
name = self.name
if name_shrinker is not None:
name = name_shrinker.shrink(name)
return ddl.FieldSpec(name=name, dtype=ddl.VALID_CONFIG_COLUMN_TYPES[self.type], **kwargs)

@abstractmethod
def to_arrow(self) -> arrow_utils.ToArrow:
Expand Down Expand Up @@ -162,9 +169,9 @@ class StringColumnSpec(_BaseColumnSpec):
length: int
"""Maximum length of strings."""

def to_sql_spec(self, **kwargs: Any) -> ddl.FieldSpec:
def to_sql_spec(self, name_shrinker: NameShrinker | None = None, **kwargs: Any) -> ddl.FieldSpec:
# Docstring inherited.
return super().to_sql_spec(length=self.length, **kwargs)
return super().to_sql_spec(length=self.length, name_shrinker=name_shrinker, **kwargs)

def to_arrow(self) -> arrow_utils.ToArrow:
# Docstring inherited.
Expand All @@ -182,9 +189,9 @@ class HashColumnSpec(_BaseColumnSpec):
nbytes: int
"""Number of bytes for the hash."""

def to_sql_spec(self, **kwargs: Any) -> ddl.FieldSpec:
def to_sql_spec(self, name_shrinker: NameShrinker | None = None, **kwargs: Any) -> ddl.FieldSpec:
# Docstring inherited.
return super().to_sql_spec(nbytes=self.nbytes, **kwargs)
return super().to_sql_spec(nbytes=self.nbytes, name_shrinker=name_shrinker, **kwargs)

def to_arrow(self) -> arrow_utils.ToArrow:
# Docstring inherited.
Expand Down
9 changes: 8 additions & 1 deletion python/lsst/daf/butler/direct_butler.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@
from ._timespan import Timespan
from .datastore import Datastore, NullDatastore
from .dimensions import DataCoordinate, Dimension
from .direct_query_driver import DirectQueryDriver
from .progress import Progress
from .queries import Query
from .registry import (
Expand Down Expand Up @@ -2136,7 +2137,13 @@ def dimensions(self) -> DimensionUniverse:
@contextlib.contextmanager
def _query(self) -> Iterator[Query]:
# Docstring inherited.
raise NotImplementedError("TODO DM-41159")
driver = DirectQueryDriver(
self._registry._db, self.dimensions, self._registry._managers, self._registry.defaults
)
query = Query(driver)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we also want to call with self._caching_context here to enable caching of collection info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea. Long term I think this can probably be the only way a caching context gets entered, which will hopefully let us get the caching out of the managers entirely, but entering the current caching context here is what makes sense right now.

with self._caching_context():
with driver:
yield query

def _preload_cache(self) -> None:
"""Immediately load caches that are used for common operations."""
Expand Down
30 changes: 30 additions & 0 deletions python/lsst/daf/butler/direct_query_driver/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# This file is part of daf_butler.
#
# Developed for the LSST Data Management System.
# This product includes software developed by the LSST Project
# (http://www.lsst.org).
# See the COPYRIGHT file at the top-level directory of this distribution
# for details of code ownership.
#
# This software is dual licensed under the GNU General Public License and also
# under a 3-clause BSD license. Recipients may choose which of these licenses
# to use; please see the files gpl-3.0.txt and/or bsd_license.txt,
# respectively. If you choose the GPL option then the following text applies
# (but note that there is still no warranty even if you opt for BSD instead):
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

from ._driver import DirectQueryDriver
from ._postprocessing import Postprocessing
from ._query_builder import QueryBuilder, QueryJoiner