Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow selectinload to skip join and filter directly on the target table #4340

Closed
sqlalchemy-bot opened this issue Sep 24, 2018 · 3 comments
Closed

Comments

@sqlalchemy-bot
Copy link
Collaborator

Migrated issue, originally created by Jayson Reis (@jaysonsantos)

This is a proposal to change how selectinload runs the query for 1xN queries.

I put an example [1] that will show you sort of how my data is structured so you can have an idea, but the gist is:

I have a main table (like customers) and another table (like order) which has the detailed data and then I created a view where summarizes customer’s orders.

The problem is that I am using selectinload and it runs the query on the view joining customer’s table and PostgreSQL won’t be able to use the proper indexes and will run it slowly.

Here [2] you can see the output of the example with explain analyze of both possible queries.

When I run query(User).options(selectinload(SummarizedOrder)).all() it will run the query like this:

SELECT "fields"
FROM "user" AS "user_1"
       JOIN "view_summarized_order" ON "user_1"."id" = "view_summarized_order"."user_id"
WHERE "user_1"."id" IN (%(primary_keys_1) s, %(primary_keys_2) s)
ORDER BY "user_1"."id"

But to make it faster, it could be this:

SELECT "fields"
FROM "view_summarized_order"
WHERE "view_summarized_order"."user_id" IN (%(primary_keys_1) s, %(primary_keys_2) s)
ORDER BY "view_summarized_order"."user_id"

In my production database, the first one will take around 8 seconds to run and the second one around 100ms.

Talking on the mailing list Michael Bayer suggests having a flag called omit_join on selectinload to deal with this edge case without breaking compatibility.
Here [3] is a working in progress pull request.

[1] https://gist.github.com/jaysonsantos/e19af47ac5d57aa5e2e2a7ed2a950994

[2] https://gist.github.com/jaysonsantos/e19af47ac5d57aa5e2e2a7ed2a950994#file-2_output-txt

[3] https://bitbucket.org/zzzeek/sqlalchemy/pull-requests/7/selectinload-omit-join/diff#comment-76901579

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

OK! very close at https://gerrit.sqlalchemy.org/#/c/zzzeek/sqlalchemy/+/885

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

selectinload omit join

The "selectin" loader strategy now omits the JOIN in the case of a
simple one-to-many load, where it instead relies upon the foreign key
columns of the related table in order to match up to primary keys in
the parent table. This optimization can be disabled by setting
the :paramref:.relationship.omit_join flag to False.
Many thanks to Jayson Reis for the efforts on this.

As part of this change, horizontal shard no longer relies upon
the _mapper_zero() method to get the query-bound mapper, instead
using the more generalized _bind_mapper() (which will use mapper_zero
if no explicit FROM is present). A short check for the particular
recursive condition is added to BundleEntity and it no longer assigns
itself as the "namespace" to its ColumnEntity objects which creates
a reference cycle.

Co-authored-by: Mike Bayer mike_mp@zzzcomputing.com
Fixes: #4340
Change-Id: I649587e1c07b684ecd63f7d10054cd165891baf4
Pull-request: https://bitbucket.org/zzzeek/sqlalchemy/pull-requests/7

21fbb5e

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed status to closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant