DM-42733: minor improvement spin-offs from query system rewrite #954

TallJimbo · 2024-01-31T04:37:55Z

Checklist

ran Jenkins
~~added a release note for user-visible changes to doc/changes~~ (nothing here should be user-visible)

TallJimbo · 2024-01-31T13:30:16Z

Docker build is failing because I apparently used a feature (string_agg) that wasn't in SQLite until 3.44.2.

Can we bump the SQLite version in the docker build easily, or should I add a new Database method to use string_agg in PostgreSQL and group_agg in SQLite?

codecov · 2024-01-31T15:43:05Z

Codecov Report

Attention: 45 lines in your changes are missing coverage. Please review.

Comparison is base (6118ee5) 88.41% compared to head (0d1d136) 88.39%.

❗ Current head 0d1d136 differs from pull request most recent head fb90c4a. Consider uploading reports for the commit fb90c4a to get more accurate results

Files	Patch %	Lines
python/lsst/daf/butler/arrow_utils.py	67.08%	26 Missing ⚠️
python/lsst/daf/butler/pydantic_utils.py	85.18%	6 Missing and 2 partials ⚠️
python/lsst/daf/butler/column_spec.py	78.94%	4 Missing ⚠️
...n/lsst/daf/butler/registry/databases/postgresql.py	78.94%	3 Missing and 1 partial ⚠️
python/lsst/daf/butler/nonempty_mapping.py	93.75%	0 Missing and 2 partials ⚠️
python/lsst/daf/butler/ddl.py	87.50%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #954      +/-   ##
==========================================
- Coverage   88.41%   88.39%   -0.03%     
==========================================
  Files         303      307       +4     
  Lines       39163    39508     +345     
  Branches     8256     8316      +60     
==========================================
+ Hits        34625    34922     +297     
- Misses       3332     3373      +41     
- Partials     1206     1213       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dhirving · 2024-01-31T17:28:06Z

Can we bump the SQLite version in the docker build easily

It doesn't look like sqlite is one of the packages available in debian-backports, so "easily" no because the version you want isn't available in the package manager. If there's some other way to replace the system sqlite library with a newer version we can use it.

We don't "have" to run Debian if you need bleeding edge versions of packages, that's just what SQRE is doing for all of their services. I decided to follow them so we can free-ride on them knowing what is going on with security and compatibility with other tooling.

It should be relatively straightforward to switch it over to be based on minimamba with rubin-env, just the image will get bigger (which doesn't really matter other than there's more surface area for security issues). Also I don't know enough about conda so I'm not sure if it's possible to get a reproducible set of packages from installing rubin-env because they don't pin versions in their recipe.

TallJimbo · 2024-01-31T19:08:36Z

I generally like the idea of getting this container based on rubin-env ones, because it seems like that shrinks the space of environments we're trying to support. But it does sound like more than I wanted to do on this branch. @timj, I think it's your call.

timj · 2024-01-31T19:35:54Z

I see that Debian trixie (the next version) has 3.45 so there is hope for the future. How much work is it to support the old sqlite? Does @rra have any thoughts on trying to get a newer sqlite into the contanier?

rra · 2024-01-31T19:44:56Z

I agree with basically everything @dhirving said: that version of SQLite is extremely new, does not appear in any released version of Debian, and hasn't been backported. If you want to use a prepackaged version, you would need to build a Docker image from Debian testing (an unreleased version of Debian). I'm not sure you want to do that.

You could instead switch to the stack image, which will get you a much older base OS image with dubious security support but a conda environment, if conda packages the new version of SQLite. SQuaRE avoids the stack images whenever possible because they are huge and are not supported by the broader open source community, so involve some reinventing of wheels to get things onto them, but the deeper you get into stack package land, the more it would make sense to go down that path.

Another option would be to install SQLite via some other mechanism, but you would still need to find a source of packages that is newer than the Debian stable release from last summer.

TallJimbo · 2024-01-31T20:03:41Z

Sounds like I should just support the old SQLite; I don't think it should be hard, and I only kept this conversation going in case we thought it was better otherwise to use the stack images.

dhirving · 2024-01-31T20:22:20Z

Sounds like I should just support the old SQLite

Another option that isn't too bad is just building sqlite from source in the Docker container. Let me know if you want me to set it up.

RUN wget https://www.sqlite.org/2024/sqlite-autoconf-3450100.tar.gz
RUN tar -xzf sqlite-autoconf-3450100.tar.gz
WORKDIR sqlite-autoconf-3450100
RUN ./configure
RUN make
RUN make install
RUN ldconfig /usr/local/lib
CMD python -c "import sqlite3;print(sqlite3.sqlite_version)"
# 3.45.1

timj · 2024-01-31T18:02:08Z

python/lsst/daf/butler/_query_results.py

        """Make the iterator return limited number of records.

        Parameters
        ----------
-        limit : `int`
+        limit : `int` or `None`, optional


Maybe state that None means all records are returned?

timj · 2024-01-31T18:18:19Z

python/lsst/daf/butler/nonempty_mapping.py

+
+
+class NonemptyMapping(Mapping[_K, _V]):
+    """A `Mapping` that implicitly adds values (like `defaultdict`) but treats


Suggested change

"""A `Mapping` that implicitly adds values (like `defaultdict`) but treats

"""A `Mapping` that implicitly adds values (like `~collections.defaultdict`) but treats

(which will likely make the line too long so this change can't be accepted as-is).

timj · 2024-01-31T18:19:24Z

python/lsst/daf/butler/nonempty_mapping.py

+
+    Notes
+    -----
+    Unlike `defaultdict`, this class implements only `collections.abc.Mapping`,


~collections.defaultdict

timj · 2024-01-31T18:19:56Z

python/lsst/daf/butler/nonempty_mapping.py

+    -----
+    Unlike `defaultdict`, this class implements only `collections.abc.Mapping`,
+    not `~collections.abc.MutableMapping`, and hence it can be modified only by
+    invoking ``__getitem__`` with a key that does not exist.  Is is expected


Suggested change

invoking ``__getitem__`` with a key that does not exist. Is is expected

invoking ``__getitem__`` with a key that does not exist. It is expected

?

timj · 2024-01-31T20:36:59Z

tests/test_nonempt_mapping.py

@@ -0,0 +1,52 @@
+# This file is part of daf_butler.


Maybe rename this file to nonempty rather than nonempt?

Apparently nobody has ever tried to multiply anything inside a butler query expression, because it wouldn't have worked. Note that we don't actually need an entry for "/" in this mapping because that requires special-casing to deal with floating-point vs. integer division anyway.

The wrapper class itself is gnarly and full of advanced Pydantic usage and metaprogramming, but it's general and it keeps the messiness outside of both the serialized types and the validated types without duplication or two-stage validation.

We mixed up which of these should be allowed to be None in the old interface, but we don't need to repeat the mistake in the new one, though some of the fix will have to wait until new interfaces stop delegating to the old ones.

We just concatenate the encoded strings with a ":" (could be anything that isn't in the base64 alphabet) then split them in Python before decoding. This will let use GROUP BY on dimension key columns for deduplication while preserving region columns needed for Python-side region-overlap postprocessing. Rename union_agg to union_aggregate.

SQLite does not require SELECT columns to be in aggregates if they are not in the GROUP BY clause. What PostgreSQL can do depends on the version.

TallJimbo · 2024-02-08T18:08:54Z

Supporting older SQLite was even easier than I expected, because SQLAlchemy already had an abstraction that resolved to the right thing for each dialect.

TallJimbo force-pushed the tickets/DM-42733 branch 2 times, most recently from a7be8cc to 826ff1e Compare January 31, 2024 04:49

TallJimbo force-pushed the tickets/DM-42733 branch from 826ff1e to 457cc63 Compare January 31, 2024 15:32

TallJimbo force-pushed the tickets/DM-42733 branch from 457cc63 to a76a037 Compare January 31, 2024 15:49

TallJimbo marked this pull request as ready for review January 31, 2024 16:09

timj approved these changes Jan 31, 2024

View reviewed changes

TallJimbo added 8 commits February 8, 2024 09:26

Add function for getting DimensionUniverse from validation context.

ac84152

Add pydantic serialization for DimensionGroup.

95f2436

Add Pydantic serialization for DimensionElement.

bd95eab

Add ColumnSpec and ToArrow variants for astropy.time.Time.

3c319db

Enumerate supported column types in a single Literal type alias.

18fdce4

Add ColumnSpec and ToArrow variants for uuid.UUID.

ac01be3

TallJimbo force-pushed the tickets/DM-42733 branch from 7161411 to 0d1d136 Compare February 8, 2024 17:38

TallJimbo added 7 commits February 8, 2024 10:08

Allow limit=None (sort of) but not offset=None.

4b73e3c

We mixed up which of these should be allowed to be None in the old interface, but we don't need to repeat the mistake in the new one, though some of the fix will have to wait until new interfaces stop delegating to the old ones.

Add NonemptyMapping utility class.

2ccf797

Get PostgreSQL server version on engine startup.

a308fbc

Add feature flags and abstraction for DISTINCT ON and ANY_VALUE.

9fc0fa1

SQLite does not require SELECT columns to be in aggregates if they are not in the GROUP BY clause. What PostgreSQL can do depends on the version.

Add TimespanDatabaseRepresentation.apply_any_aggregate.

cd1727d

Add tests for new SQL aggregate functionality.

a31864d

TallJimbo added 2 commits February 8, 2024 10:08

Drop removed symbol from doc build skips.

38c3e67

Add some previously-omitted modules to doc build.

fb90c4a

TallJimbo force-pushed the tickets/DM-42733 branch from 0d1d136 to fb90c4a Compare February 8, 2024 18:09

TallJimbo merged commit 3fee28c into main Feb 8, 2024
16 checks passed

TallJimbo deleted the tickets/DM-42733 branch February 8, 2024 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-42733: minor improvement spin-offs from query system rewrite #954

DM-42733: minor improvement spin-offs from query system rewrite #954

TallJimbo commented Jan 31, 2024 •

edited

TallJimbo commented Jan 31, 2024

codecov bot commented Jan 31, 2024 •

edited

dhirving commented Jan 31, 2024

TallJimbo commented Jan 31, 2024

timj commented Jan 31, 2024

rra commented Jan 31, 2024

TallJimbo commented Jan 31, 2024

dhirving commented Jan 31, 2024

timj Jan 31, 2024

timj Jan 31, 2024

timj Jan 31, 2024

timj Jan 31, 2024

timj Jan 31, 2024

TallJimbo commented Feb 8, 2024



		class NonemptyMapping(Mapping[_K, _V]):
		"""A `Mapping` that implicitly adds values (like `defaultdict`) but treats

	invoking ``__getitem__`` with a key that does not exist. Is is expected
	invoking ``__getitem__`` with a key that does not exist. It is expected

DM-42733: minor improvement spin-offs from query system rewrite #954

DM-42733: minor improvement spin-offs from query system rewrite #954

Conversation

TallJimbo commented Jan 31, 2024 • edited

Checklist

TallJimbo commented Jan 31, 2024

codecov bot commented Jan 31, 2024 • edited

Codecov Report

dhirving commented Jan 31, 2024

TallJimbo commented Jan 31, 2024

timj commented Jan 31, 2024

rra commented Jan 31, 2024

TallJimbo commented Jan 31, 2024

dhirving commented Jan 31, 2024

timj Jan 31, 2024

Choose a reason for hiding this comment

timj Jan 31, 2024

Choose a reason for hiding this comment

timj Jan 31, 2024

Choose a reason for hiding this comment

timj Jan 31, 2024

Choose a reason for hiding this comment

timj Jan 31, 2024

Choose a reason for hiding this comment

TallJimbo commented Feb 8, 2024

TallJimbo commented Jan 31, 2024 •

edited

codecov bot commented Jan 31, 2024 •

edited