Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-31725: stand up initial version of package #1

Merged
merged 7 commits into from
Dec 6, 2022
Merged

Conversation

TallJimbo
Copy link
Member

No description provided.

@gpdf
Copy link

gpdf commented Aug 20, 2022

Is this something being factored out of daf_butler, or ctrl_mpexec? (I happened to see some automated mail from GitHub about it.)

@TallJimbo
Copy link
Member Author

It's new code that would otherwise probably go into daf_butler: it's a rewrite of much of the query expression handling.

@codecov-commenter
Copy link

codecov-commenter commented Sep 24, 2022

Codecov Report

❗ No coverage uploaded for pull request base (main@c433d42). Click here to learn what that means.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main       #1   +/-   ##
=======================================
  Coverage        ?   88.57%           
=======================================
  Files           ?        1           
  Lines           ?       35           
  Branches        ?        2           
=======================================
  Hits            ?       31           
  Misses          ?        3           
  Partials        ?        1           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@TallJimbo TallJimbo force-pushed the tickets/DM-31725 branch 2 times, most recently from 8f379e6 to 31a832b Compare October 11, 2022 18:24
@TallJimbo TallJimbo force-pushed the tickets/DM-31725 branch 5 times, most recently from a2531bd to fdbcf8e Compare November 3, 2022 19:42
@TallJimbo TallJimbo marked this pull request as ready for review November 29, 2022 21:26
@TallJimbo TallJimbo self-assigned this Nov 29, 2022
@TallJimbo TallJimbo requested a review from timj November 29, 2022 21:39
Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not looked at every file in details but the package is well organized, it has some nice documentation, and follows all our conventions. I'm very pleased to see 90% test coverage.

* ``perf``: A performance enhancement.
* ``doc``: A documentation improvement.
* ``removal``: An API removal or deprecation.
* ``other``: Other Changes and Additions of interest to general users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dropped "other" at some point because no-one could remember the difference between "other" and "misc" but I see I haven't updated the documentation in all the packages to reflect that (specifically daf_butler still talks about other).


The SQL engine has a non-trivial `Engine.conform` method and overrides other methods to keep relation trees in a certain order when applying new operations.
It uses a custom `Select` marker type to partition the tree into subqueries and mark a tree as conformed.
This order attempts to avoid subqueries whenever possible by pulling `Projection`, `Deduplication`, `Sort`, and `Slice` operations downstream when commutation rules permit, both to keep the SQL output simple and to reduce the loss of row-ordering imposed by `Sort` operations whenever possible (i.e. a `Sort` can only be applied in the outermost query if it is to have any effect).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These classes don't turn into links in the sphinx docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, need to add that pesky leading . to them. (By the way, since I only discovered this behavior while working on this ticket, I'm sure we've got a ton of preexisting bad links in daf_butler that need it as well.)


The `Engine` and `Payload` classes are generic over a parameter we refer to as the "logical column type".
In the simplest case, the logical column type is just `sqlalchemy.sql.ColumnElement`, and this is what the default implementations of most `Engine` methods assume.
Custom subclasses of the SQL `Engine` can use other types, such as wrappers that hold one or more `sqlalchemy.sql.ColumnElement` objects, as long as they override a few `Engine` methods to handle them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to include some sqlalchemy index in the sphinx configuration for these links to work?

Copy link
Member Author

@TallJimbo TallJimbo Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, and I think I was meaning to ask @jonathansick about it. I don't see anything about intersphinx in either the package-local conf.py or the one in https://github.com/lsst/pipelines_lsst_io, so I assume this is something we import from documenteer in those files.

I also haven't been able to find the right URL to associate with the sqlalchemy package in the intersphinx mapping, which may mean there isn't one - their docs sure look like they've been built with sphinx, but I don't think that means they necessarily publish the interface that intersphinx needs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add sqlalchemy to the documenteer configuration for pipelines packages. It does use Sphinx for the documentation sites of individual versions. DM-37194

Here's background on the Sphinx configuration for Rubin to clarify how conf.py files work and this documentation page specifically on the Sphinx configuration settings (via a page in the Developer Guide).

@codecov
Copy link

codecov bot commented Dec 2, 2022

Codecov Report

❗ No coverage uploaded for pull request base (main@c433d42). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main       #1   +/-   ##
=======================================
  Coverage        ?   97.10%           
=======================================
  Files           ?       15           
  Lines           ?     1242           
  Branches        ?      117           
=======================================
  Hits            ?     1206           
  Misses          ?       19           
  Partials        ?       17           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

This includes a large number of related changes:

- Markers (including Transfer and Materialization) are now Relation
  types, not operation types.

- UnaryOperation.apply is now final, with a number of new hooks for
  customization by both derived operations and engines, with
  backtracking dedicated to Engines.

- The SQL engine overrides these hooks to partition and reorder its
  trees into subqueries indicated by the new Select marker.  This lets
  it preserve row-order from sorts much better.  It does not attempt to
  support backtracking at all.

- The iteration engine does support backtracking, and since it always
  preserves order, we can drop all ordering checks (and the
  preserves_order method itself) in the backtracking algorithm.

- Relations now use immutable concrete containers (frozenset, tuple)
  instead of their collections.abc generalizations to ensure
  hashability (though this is just in type annotations - users still
  have to actually pass in hashable containers at runtime).

And, finally, there's now a pretty complete battery of unit tests,
with missing coverage limited to:

- Simple exception-raising checks for logic bugs in calling code that
  just didn't seem worth that much attention.

- Defensive code that is at least very difficult to trigger, if not
  impossible, without violating other rules (e.g. getting a
  not-already-conformed Relation tree in the SQL engine to call its
  conform method on).
@TallJimbo TallJimbo merged commit 619ccee into main Dec 6, 2022
@TallJimbo TallJimbo deleted the tickets/DM-31725 branch December 6, 2022 18:15
always returns the operand relation passed to it.
`Join` (`BinaryOperation`) / `Relation.join`
Perform a natural join: combine two relations by matching rows with the same values in their common columns (and satisfying an optional column expression, via a `Predicate`), producing a new relation whose columns are the union of the columns of its operands.
This is equivalent to [``INNER``] ``JOIN`` in SQL.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also need outer joins, I think I saw at least one case in daf_butler?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants