Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-36325L Allow list or tuple as bound value for IN operator #745

Merged
merged 3 commits into from
Oct 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/changes/DM-36325.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Bind values in Registry queries can now specify list/tuple of numbers for identifiers appearing on the right-hand side of `IN` expression.
41 changes: 41 additions & 0 deletions doc/lsst.daf.butler/queries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,46 @@ schema and how SQL query is built). A simple identifier with a name
``ingest_date`` is used to reference dataset ingest time, which can be used to
filter query results based on that property of datasets.

Registry methods accepting user expressions also accept a ``bind`` parameter, which is a mapping from identifier name to its corresponding value.
Identifiers appearing in user expressions will be replaced with the corresponding value from this mapping.
Using the ``bind`` parameter is encouraged when possible to simplify rendering of the query strings.
A partial example of comparing two approaches, without and with ``bind``:

.. code-block:: Python

instrument_name = "LSST"
visit_id = 12345

# Direct rendering of query not using bind
result = registry.queryDatasets(
...,
where=f"instrument = '{instrument_name}' AND visit = {visit_id}",
)

# Same functionality using bind parameter
result = registry.queryDatasets(
...,
where="instrument = instrument_name AND visit = visit_id",
bind={"instrument_name": instrument_name, "visit_id": visit_id},
)

Types of values provided in a ``bind`` mapping must correspond to the expected type of the expression, which is usually a scalar type, one of ``int``, ``float``, ``str``, etc.
There is one context where a bound value can specify a list, tuple or set of values: an identifier appearing in the right-hand side of :ref:`expressions-in-operator`.
Note that parentheses after ``IN`` are still required when identifier is bound to a list or a tuple.
An example of this feature:

.. code-block:: Python

instrument_name = "LSST"
visit_ids = (12345, 12346, 12350)
result = registry.queryDatasets(
...,
where="instrument = instrument_name AND visit IN (visit_ids)",
bind={"instrument_name": instrument_name, "visit_ids": visit_ids},
)



Unary arithmetic operators
^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -190,6 +230,7 @@ expressions.
.. note :: The equality comparison operator is a single ``=`` like in SQL, not
double ``==`` like in Python or C++.

.. _expressions-in-operator:

IN operator
^^^^^^^^^^^
Expand Down
16 changes: 13 additions & 3 deletions python/lsst/daf/butler/registry/queries/expressions/check.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@
)

import dataclasses
from typing import TYPE_CHECKING, Any, List, Mapping, Optional, Sequence, Set, Tuple
from collections.abc import Mapping, Sequence, Set
from typing import TYPE_CHECKING, Any, List, Optional, Tuple

from ....core import (
DataCoordinate,
Expand Down Expand Up @@ -72,7 +73,7 @@ def update(self, other: InspectionSummary) -> None:
in this branch (`NamedValueSet` [ `Dimension` ]).
"""

columns: NamedKeyDict[DimensionElement, Set[str]] = dataclasses.field(default_factory=NamedKeyDict)
columns: NamedKeyDict[DimensionElement, set[str]] = dataclasses.field(default_factory=NamedKeyDict)
"""Dimension element tables whose columns were referenced anywhere in this
branch (`NamedKeyDict` [ `DimensionElement`, `set` [ `str` ] ]).
"""
Expand Down Expand Up @@ -184,7 +185,16 @@ def visitTimeLiteral(self, value: astropy.time.Time, node: Node) -> TreeSummary:
def visitIdentifier(self, name: str, node: Node) -> TreeSummary:
# Docstring inherited from TreeVisitor.visitIdentifier
if name in self.bind:
return TreeSummary(dataIdValue=self.bind[name])
value = self.bind[name]
if isinstance(value, (list, tuple, Set)):
# This can happen on rhs of IN operator, if there is only one
# element in the list then take it.
if len(value) == 1:
return TreeSummary(dataIdValue=next(iter(value)))
else:
return TreeSummary()
else:
return TreeSummary(dataIdValue=value)
constant = categorizeConstant(name)
if constant is ExpressionConstant.INGEST_DATE:
return TreeSummary(hasIngestDate=True)
Expand Down
66 changes: 61 additions & 5 deletions python/lsst/daf/butler/registry/queries/expressions/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import operator
import warnings
from abc import ABC, abstractmethod
from collections.abc import Set
from datetime import datetime
from typing import (
TYPE_CHECKING,
Expand Down Expand Up @@ -98,7 +99,7 @@ def convertExpressionToSql(
elements : `NamedKeyMapping`
`DimensionElement` instances and their associated tables.
bind : `Mapping`
Mapping from string names to literal values that should be subsituted
Mapping from string names to literal values that should be substituted
for those names when they appear (as identifiers) in the expression.
TimespanReprClass : `type`; subclass of `TimespanDatabaseRepresentation`
Class that encapsulates the representation of `Timespan` objects in
Expand Down Expand Up @@ -147,7 +148,7 @@ def __init__(self, literal: datetime):
def compile_timestamp_literal_sqlite(element: Any, compiler: Any, **kw: Mapping[str, Any]) -> str:
"""Compilation of TIMESTAMP literal for SQLite.

SQLite defines ``datetiem`` function that can be used to convert timestamp
SQLite defines ``datetime`` function that can be used to convert timestamp
value to Unix seconds.
"""
return compiler.process(func.datetime(sqlalchemy.sql.literal(element._literal)), **kw)
Expand Down Expand Up @@ -215,7 +216,7 @@ def finish(self, node: Node) -> sqlalchemy.sql.ColumnElement:
Parameters
----------
node : `Node`
Original expression tree nodethis converter represents; used only
Original expression tree node this converter represents; used only
for error reporting.

Returns
Expand Down Expand Up @@ -379,6 +380,57 @@ def categorizeForIn(
literals.append(self.column)


class SequenceWhereClauseConverter(WhereClauseConverter):
"""Implementation of WhereClauseConverter, for expressions that represent
a sequence of `sqlalchemy.sql.ColumnElement` instance.

This converter is intended for bound identifiers whose bind value is a
sequence (but not string), which should only appear in the right hand side
of ``IN`` operator. It should be constructed by calling `fromLiteral`
method.

Parameters
----------
columns : `list` [ `ScalarWhereClauseConverter` ]
Converters for items in the sequence.
"""

def __init__(self, scalars: List[ScalarWhereClauseConverter]):
self.scalars = scalars

@classmethod
def fromLiteral(cls, values: Iterable[Any]) -> SequenceWhereClauseConverter:
"""Construct from an iterable of Python literals.

Parameters
----------
values : `list`
The Python literals to wrap.

Returns
-------
converter : `SequenceWhereClauseConverter`
Converter instance that wraps ``values``.
"""
return cls([ScalarWhereClauseConverter.fromLiteral(value) for value in values])

@property
def dtype(self) -> type:
# Docstring inherited.
return list

def categorizeForIn(
self,
literals: List[sqlalchemy.sql.ColumnElement],
ranges: List[Tuple[int, int, int]],
dtype: type,
node: Node,
) -> None:
# Docstring inherited.
for scalar in self.scalars:
scalar.categorizeForIn(literals, ranges, dtype, node)


class TimespanWhereClauseConverter(WhereClauseConverter):
"""Implementation of WhereClauseConverter for `Timespan` expressions.

Expand Down Expand Up @@ -615,7 +667,7 @@ def coerceTimes(cls, *args: ScalarWhereClauseConverter) -> List[ScalarWhereClaus
Returns
-------
converters : `list` [ `ScalarWhereClauseConverter` ]
List of converters in the same order as they appera in argument
List of converters in the same order as they appear in argument
list, some of them can be coerced to `datetime` type, non-coerced
arguments are returned without any change.
"""
Expand Down Expand Up @@ -1006,7 +1058,7 @@ class WhereClauseConverterVisitor(TreeVisitor[WhereClauseConverter]):
elements: `NamedKeyMapping`
`DimensionElement` instances and their associated tables.
bind: `Mapping`
Mapping from string names to literal values that should be subsituted
Mapping from string names to literal values that should be substituted
for those names when they appear (as identifiers) in the expression.
TimespanReprClass: `type`; subclass of `TimespanDatabaseRepresentation`
Class that encapsulates the representation of `Timespan` objects in
Expand Down Expand Up @@ -1052,6 +1104,10 @@ def visitIdentifier(self, name: str, node: Node) -> WhereClauseConverter:
value = self.bind[name]
if isinstance(value, Timespan):
return TimespanWhereClauseConverter(self._TimespanReprClass.fromLiteral(value))
elif isinstance(value, (list, tuple, Set)):
# Only accept list, tuple, and Set, general test for Iterables
# is not reliable (e.g. astropy Time is Iterable).
return SequenceWhereClauseConverter.fromLiteral(value)
return ScalarWhereClauseConverter.fromLiteral(value)
constant = categorizeConstant(name)
if constant is ExpressionConstant.INGEST_DATE:
Expand Down