New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-42733: minor improvement spin-offs from query system rewrite #954
Conversation
a7be8cc
to
826ff1e
Compare
Docker build is failing because I apparently used a feature ( Can we bump the SQLite version in the docker build easily, or should I add a new |
826ff1e
to
457cc63
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #954 +/- ##
==========================================
- Coverage 88.41% 88.39% -0.03%
==========================================
Files 303 307 +4
Lines 39163 39508 +345
Branches 8256 8316 +60
==========================================
+ Hits 34625 34922 +297
- Misses 3332 3373 +41
- Partials 1206 1213 +7 ☔ View full report in Codecov by Sentry. |
457cc63
to
a76a037
Compare
It doesn't look like sqlite is one of the packages available in debian-backports, so "easily" no because the version you want isn't available in the package manager. If there's some other way to replace the system sqlite library with a newer version we can use it. We don't "have" to run Debian if you need bleeding edge versions of packages, that's just what SQRE is doing for all of their services. I decided to follow them so we can free-ride on them knowing what is going on with security and compatibility with other tooling. It should be relatively straightforward to switch it over to be based on minimamba with rubin-env, just the image will get bigger (which doesn't really matter other than there's more surface area for security issues). Also I don't know enough about conda so I'm not sure if it's possible to get a reproducible set of packages from installing rubin-env because they don't pin versions in their recipe. |
I generally like the idea of getting this container based on rubin-env ones, because it seems like that shrinks the space of environments we're trying to support. But it does sound like more than I wanted to do on this branch. @timj, I think it's your call. |
I see that Debian trixie (the next version) has 3.45 so there is hope for the future. How much work is it to support the old sqlite? Does @rra have any thoughts on trying to get a newer sqlite into the contanier? |
I agree with basically everything @dhirving said: that version of SQLite is extremely new, does not appear in any released version of Debian, and hasn't been backported. If you want to use a prepackaged version, you would need to build a Docker image from Debian testing (an unreleased version of Debian). I'm not sure you want to do that. You could instead switch to the stack image, which will get you a much older base OS image with dubious security support but a conda environment, if conda packages the new version of SQLite. SQuaRE avoids the stack images whenever possible because they are huge and are not supported by the broader open source community, so involve some reinventing of wheels to get things onto them, but the deeper you get into stack package land, the more it would make sense to go down that path. Another option would be to install SQLite via some other mechanism, but you would still need to find a source of packages that is newer than the Debian stable release from last summer. |
Sounds like I should just support the old SQLite; I don't think it should be hard, and I only kept this conversation going in case we thought it was better otherwise to use the stack images. |
Another option that isn't too bad is just building sqlite from source in the Docker container. Let me know if you want me to set it up.
|
"""Make the iterator return limited number of records. | ||
|
||
Parameters | ||
---------- | ||
limit : `int` | ||
limit : `int` or `None`, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe state that None
means all records are returned?
|
||
|
||
class NonemptyMapping(Mapping[_K, _V]): | ||
"""A `Mapping` that implicitly adds values (like `defaultdict`) but treats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""A `Mapping` that implicitly adds values (like `defaultdict`) but treats | |
"""A `Mapping` that implicitly adds values (like `~collections.defaultdict`) but treats |
(which will likely make the line too long so this change can't be accepted as-is).
|
||
Notes | ||
----- | ||
Unlike `defaultdict`, this class implements only `collections.abc.Mapping`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~collections.defaultdict
----- | ||
Unlike `defaultdict`, this class implements only `collections.abc.Mapping`, | ||
not `~collections.abc.MutableMapping`, and hence it can be modified only by | ||
invoking ``__getitem__`` with a key that does not exist. Is is expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
invoking ``__getitem__`` with a key that does not exist. Is is expected | |
invoking ``__getitem__`` with a key that does not exist. It is expected |
?
tests/test_nonempt_mapping.py
Outdated
@@ -0,0 +1,52 @@ | |||
# This file is part of daf_butler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename this file to nonempty rather than nonempt?
Apparently nobody has ever tried to multiply anything inside a butler query expression, because it wouldn't have worked. Note that we don't actually need an entry for "/" in this mapping because that requires special-casing to deal with floating-point vs. integer division anyway.
The wrapper class itself is gnarly and full of advanced Pydantic usage and metaprogramming, but it's general and it keeps the messiness outside of both the serialized types and the validated types without duplication or two-stage validation.
7161411
to
0d1d136
Compare
We mixed up which of these should be allowed to be None in the old interface, but we don't need to repeat the mistake in the new one, though some of the fix will have to wait until new interfaces stop delegating to the old ones.
We just concatenate the encoded strings with a ":" (could be anything that isn't in the base64 alphabet) then split them in Python before decoding. This will let use GROUP BY on dimension key columns for deduplication while preserving region columns needed for Python-side region-overlap postprocessing. Rename union_agg to union_aggregate.
SQLite does not require SELECT columns to be in aggregates if they are not in the GROUP BY clause. What PostgreSQL can do depends on the version.
Supporting older SQLite was even easier than I expected, because SQLAlchemy already had an abstraction that resolved to the right thing for each dialect. |
0d1d136
to
fb90c4a
Compare
Checklist
added a release note for user-visible changes to(nothing here should be user-visible)doc/changes