Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-42733: minor improvement spin-offs from query system rewrite #954

Merged
merged 17 commits into from Feb 8, 2024

Conversation

TallJimbo
Copy link
Member

@TallJimbo TallJimbo commented Jan 31, 2024

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes (nothing here should be user-visible)

@TallJimbo TallJimbo force-pushed the tickets/DM-42733 branch 2 times, most recently from a7be8cc to 826ff1e Compare January 31, 2024 04:49
@TallJimbo
Copy link
Member Author

Docker build is failing because I apparently used a feature (string_agg) that wasn't in SQLite until 3.44.2.

Can we bump the SQLite version in the docker build easily, or should I add a new Database method to use string_agg in PostgreSQL and group_agg in SQLite?

Copy link

codecov bot commented Jan 31, 2024

Codecov Report

Attention: 45 lines in your changes are missing coverage. Please review.

Comparison is base (6118ee5) 88.41% compared to head (0d1d136) 88.39%.

❗ Current head 0d1d136 differs from pull request most recent head fb90c4a. Consider uploading reports for the commit fb90c4a to get more accurate results

Files Patch % Lines
python/lsst/daf/butler/arrow_utils.py 67.08% 26 Missing ⚠️
python/lsst/daf/butler/pydantic_utils.py 85.18% 6 Missing and 2 partials ⚠️
python/lsst/daf/butler/column_spec.py 78.94% 4 Missing ⚠️
...n/lsst/daf/butler/registry/databases/postgresql.py 78.94% 3 Missing and 1 partial ⚠️
python/lsst/daf/butler/nonempty_mapping.py 93.75% 0 Missing and 2 partials ⚠️
python/lsst/daf/butler/ddl.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #954      +/-   ##
==========================================
- Coverage   88.41%   88.39%   -0.03%     
==========================================
  Files         303      307       +4     
  Lines       39163    39508     +345     
  Branches     8256     8316      +60     
==========================================
+ Hits        34625    34922     +297     
- Misses       3332     3373      +41     
- Partials     1206     1213       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TallJimbo TallJimbo marked this pull request as ready for review January 31, 2024 16:09
@dhirving
Copy link
Contributor

Can we bump the SQLite version in the docker build easily

It doesn't look like sqlite is one of the packages available in debian-backports, so "easily" no because the version you want isn't available in the package manager. If there's some other way to replace the system sqlite library with a newer version we can use it.

We don't "have" to run Debian if you need bleeding edge versions of packages, that's just what SQRE is doing for all of their services. I decided to follow them so we can free-ride on them knowing what is going on with security and compatibility with other tooling.

It should be relatively straightforward to switch it over to be based on minimamba with rubin-env, just the image will get bigger (which doesn't really matter other than there's more surface area for security issues). Also I don't know enough about conda so I'm not sure if it's possible to get a reproducible set of packages from installing rubin-env because they don't pin versions in their recipe.

@TallJimbo
Copy link
Member Author

I generally like the idea of getting this container based on rubin-env ones, because it seems like that shrinks the space of environments we're trying to support. But it does sound like more than I wanted to do on this branch. @timj, I think it's your call.

@timj
Copy link
Member

timj commented Jan 31, 2024

I see that Debian trixie (the next version) has 3.45 so there is hope for the future. How much work is it to support the old sqlite? Does @rra have any thoughts on trying to get a newer sqlite into the contanier?

@rra
Copy link
Member

rra commented Jan 31, 2024

I agree with basically everything @dhirving said: that version of SQLite is extremely new, does not appear in any released version of Debian, and hasn't been backported. If you want to use a prepackaged version, you would need to build a Docker image from Debian testing (an unreleased version of Debian). I'm not sure you want to do that.

You could instead switch to the stack image, which will get you a much older base OS image with dubious security support but a conda environment, if conda packages the new version of SQLite. SQuaRE avoids the stack images whenever possible because they are huge and are not supported by the broader open source community, so involve some reinventing of wheels to get things onto them, but the deeper you get into stack package land, the more it would make sense to go down that path.

Another option would be to install SQLite via some other mechanism, but you would still need to find a source of packages that is newer than the Debian stable release from last summer.

@TallJimbo
Copy link
Member Author

Sounds like I should just support the old SQLite; I don't think it should be hard, and I only kept this conversation going in case we thought it was better otherwise to use the stack images.

@dhirving
Copy link
Contributor

Sounds like I should just support the old SQLite

Another option that isn't too bad is just building sqlite from source in the Docker container. Let me know if you want me to set it up.

RUN wget https://www.sqlite.org/2024/sqlite-autoconf-3450100.tar.gz
RUN tar -xzf sqlite-autoconf-3450100.tar.gz
WORKDIR sqlite-autoconf-3450100
RUN ./configure
RUN make
RUN make install
RUN ldconfig /usr/local/lib
CMD python -c "import sqlite3;print(sqlite3.sqlite_version)"
# 3.45.1

"""Make the iterator return limited number of records.

Parameters
----------
limit : `int`
limit : `int` or `None`, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe state that None means all records are returned?



class NonemptyMapping(Mapping[_K, _V]):
"""A `Mapping` that implicitly adds values (like `defaultdict`) but treats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""A `Mapping` that implicitly adds values (like `defaultdict`) but treats
"""A `Mapping` that implicitly adds values (like `~collections.defaultdict`) but treats

(which will likely make the line too long so this change can't be accepted as-is).


Notes
-----
Unlike `defaultdict`, this class implements only `collections.abc.Mapping`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~collections.defaultdict

-----
Unlike `defaultdict`, this class implements only `collections.abc.Mapping`,
not `~collections.abc.MutableMapping`, and hence it can be modified only by
invoking ``__getitem__`` with a key that does not exist. Is is expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
invoking ``__getitem__`` with a key that does not exist. Is is expected
invoking ``__getitem__`` with a key that does not exist. It is expected

?

@@ -0,0 +1,52 @@
# This file is part of daf_butler.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename this file to nonempty rather than nonempt?

Apparently nobody has ever tried to multiply anything inside a butler
query expression, because it wouldn't have worked.

Note that we don't actually need an entry for "/" in this mapping
because that requires special-casing to deal with floating-point vs.
integer division anyway.
The wrapper class itself is gnarly and full of advanced Pydantic usage
and metaprogramming, but it's general and it keeps the messiness
outside of both the serialized types and the validated types without
duplication or two-stage validation.
We mixed up which of these should be allowed to be None in the old
interface, but we don't need to repeat the mistake in the new one,
though some of the fix will have to wait until new interfaces stop
delegating to the old ones.
We just concatenate the encoded strings with a ":" (could be anything
that isn't in the base64 alphabet) then split them in Python before
decoding.

This will let use GROUP BY on dimension key columns for deduplication
while preserving region columns needed for Python-side region-overlap
postprocessing.

Rename union_agg to union_aggregate.
SQLite does not require SELECT columns to be in aggregates if they are
not in the GROUP BY clause.  What PostgreSQL can do depends on the
version.
@TallJimbo
Copy link
Member Author

Supporting older SQLite was even easier than I expected, because SQLAlchemy already had an abstraction that resolved to the right thing for each dialect.

@TallJimbo TallJimbo merged commit 3fee28c into main Feb 8, 2024
16 checks passed
@TallJimbo TallJimbo deleted the tickets/DM-42733 branch February 8, 2024 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants