Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-36325L Allow list or tuple as bound value for IN operator #745

Merged
merged 3 commits into from
Oct 26, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/changes/DM-36325.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Bind values in Registry queries can now specify list/tuple of numbers for identifiers appearing on the right-hand side of `IN` expression.
41 changes: 41 additions & 0 deletions doc/lsst.daf.butler/queries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,46 @@ schema and how SQL query is built). A simple identifier with a name
``ingest_date`` is used to reference dataset ingest time, which can be used to
filter query results based on that property of datasets.

Registry methods accepting user expressions also accept ``bind`` parameter which is a mapping from identifier name to its corresponding value.
andy-slac marked this conversation as resolved.
Show resolved Hide resolved
Identifiers appearing in a user expressions will be replaced with the corresponding value from this mapping.
andy-slac marked this conversation as resolved.
Show resolved Hide resolved
Use of ``bind`` parameter is encouraged when possible to simplify rendering of the query strings.
andy-slac marked this conversation as resolved.
Show resolved Hide resolved
A partial example of comparing two approaches, without and with ``bind``:

.. code-block:: Python

instrument_name = "LSST"
visit_id = 12345

# Direct rendering of query not using bind
result = registry.queryDatasets(
...,
where=f"instrument = '{instrument_name}' AND visit = {visit_id}",
)

# Same functionality using bind parameter
result = registry.queryDatasets(
...,
where="instrument = instrument_name AND visit = visit_id",
bind={"instrument_name": instrument_name, "visit_id": visit_id},
)

Types of values provided in a ``bind`` mapping must correspond to the expected type of the expression, which is usually a scalar type, one of ``int``, ``float``, ``str``, etc.
There is one context where a bound value can specify a list or a tuple of values, this can be provided for an identifier appearing in the right-hand side of :ref:`expressions-in-operator`.
andy-slac marked this conversation as resolved.
Show resolved Hide resolved
Note that parentheses after ``IN`` are still required when identifier is bound to a list or a tuple.
An example of this feature:

.. code-block:: Python

instrument_name = "LSST"
visit_ids = (12345, 12346, 12350)
result = registry.queryDatasets(
...,
where="instrument = instrument_name AND visit IN (visit_ids)",
bind={"instrument_name": instrument_name, "visit_ids": visit_ids},
)



Unary arithmetic operators
^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -190,6 +230,7 @@ expressions.
.. note :: The equality comparison operator is a single ``=`` like in SQL, not
double ``==`` like in Python or C++.

.. _expressions-in-operator:

IN operator
^^^^^^^^^^^
Expand Down
11 changes: 10 additions & 1 deletion python/lsst/daf/butler/registry/queries/expressions/check.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,16 @@ def visitTimeLiteral(self, value: astropy.time.Time, node: Node) -> TreeSummary:
def visitIdentifier(self, name: str, node: Node) -> TreeSummary:
# Docstring inherited from TreeVisitor.visitIdentifier
if name in self.bind:
return TreeSummary(dataIdValue=self.bind[name])
value = self.bind[name]
if isinstance(value, (list, tuple)):
# This can happen on rhs of IN operator, if there is only one
# element in the list then take it.
if len(value) == 1:
return TreeSummary(dataIdValue=value[0])
else:
return TreeSummary()
else:
return TreeSummary(dataIdValue=value)
constant = categorizeConstant(name)
if constant is ExpressionConstant.INGEST_DATE:
return TreeSummary(hasIngestDate=True)
Expand Down
65 changes: 60 additions & 5 deletions python/lsst/daf/butler/registry/queries/expressions/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ def convertExpressionToSql(
elements : `NamedKeyMapping`
`DimensionElement` instances and their associated tables.
bind : `Mapping`
Mapping from string names to literal values that should be subsituted
Mapping from string names to literal values that should be substituted
for those names when they appear (as identifiers) in the expression.
TimespanReprClass : `type`; subclass of `TimespanDatabaseRepresentation`
Class that encapsulates the representation of `Timespan` objects in
Expand Down Expand Up @@ -147,7 +147,7 @@ def __init__(self, literal: datetime):
def compile_timestamp_literal_sqlite(element: Any, compiler: Any, **kw: Mapping[str, Any]) -> str:
"""Compilation of TIMESTAMP literal for SQLite.

SQLite defines ``datetiem`` function that can be used to convert timestamp
SQLite defines ``datetime`` function that can be used to convert timestamp
value to Unix seconds.
"""
return compiler.process(func.datetime(sqlalchemy.sql.literal(element._literal)), **kw)
Expand Down Expand Up @@ -215,7 +215,7 @@ def finish(self, node: Node) -> sqlalchemy.sql.ColumnElement:
Parameters
----------
node : `Node`
Original expression tree nodethis converter represents; used only
Original expression tree node this converter represents; used only
for error reporting.

Returns
Expand Down Expand Up @@ -379,6 +379,57 @@ def categorizeForIn(
literals.append(self.column)


class SequenceWhereClauseConverter(WhereClauseConverter):
"""Implementation of WhereClauseConverter, for expressions that represent
a sequence of `sqlalchemy.sql.ColumnElement` instance.

This converter is intended for bound identifiers whose bind value is a
sequence (but not string), which should only appear in the right hand side
of ``IN`` operator. It should be constructed by calling `fromLiteral`
method.

Parameters
----------
columns : `list` [ `ScalarWhereClauseConverter` ]
Converters for items in the sequence.
"""

def __init__(self, scalars: List[ScalarWhereClauseConverter]):
self.scalars = scalars

@classmethod
def fromLiteral(cls, values: Iterable[Any]) -> SequenceWhereClauseConverter:
"""Construct from an iterable of Python literals.

Parameters
----------
values : `list`
The Python literals to wrap.

Returns
-------
converter : `SequenceWhereClauseConverter`
Converter instance that wraps ``values``.
"""
return cls([ScalarWhereClauseConverter.fromLiteral(value) for value in values])

@property
def dtype(self) -> type:
# Docstring inherited.
return list

def categorizeForIn(
self,
literals: List[sqlalchemy.sql.ColumnElement],
ranges: List[Tuple[int, int, int]],
dtype: type,
node: Node,
) -> None:
# Docstring inherited.
for scalar in self.scalars:
scalar.categorizeForIn(literals, ranges, dtype, node)


class TimespanWhereClauseConverter(WhereClauseConverter):
"""Implementation of WhereClauseConverter for `Timespan` expressions.

Expand Down Expand Up @@ -615,7 +666,7 @@ def coerceTimes(cls, *args: ScalarWhereClauseConverter) -> List[ScalarWhereClaus
Returns
-------
converters : `list` [ `ScalarWhereClauseConverter` ]
List of converters in the same order as they appera in argument
List of converters in the same order as they appear in argument
list, some of them can be coerced to `datetime` type, non-coerced
arguments are returned without any change.
"""
Expand Down Expand Up @@ -1006,7 +1057,7 @@ class WhereClauseConverterVisitor(TreeVisitor[WhereClauseConverter]):
elements: `NamedKeyMapping`
`DimensionElement` instances and their associated tables.
bind: `Mapping`
Mapping from string names to literal values that should be subsituted
Mapping from string names to literal values that should be substituted
for those names when they appear (as identifiers) in the expression.
TimespanReprClass: `type`; subclass of `TimespanDatabaseRepresentation`
Class that encapsulates the representation of `Timespan` objects in
Expand Down Expand Up @@ -1052,6 +1103,10 @@ def visitIdentifier(self, name: str, node: Node) -> WhereClauseConverter:
value = self.bind[name]
if isinstance(value, Timespan):
return TimespanWhereClauseConverter(self._TimespanReprClass.fromLiteral(value))
elif isinstance(value, (list, tuple)):
# Only accept lists and tuples, general test for Iterables is
# not reliable (e.g. astropy Time is Iterable).
return SequenceWhereClauseConverter.fromLiteral(value)
return ScalarWhereClauseConverter.fromLiteral(value)
constant = categorizeConstant(name)
if constant is ExpressionConstant.INGEST_DATE:
Expand Down