New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some issues with prefetching and related queries #106

Closed
jasonmyers opened this Issue Nov 20, 2014 · 3 comments

Comments

Projects
None yet
2 participants
@jasonmyers
Contributor

jasonmyers commented Nov 20, 2014

(Using pony-0.6, Python 3.3)

I'm seeing some extra queries I don't understand during prefetching and getting related entities. Setup is a simple Forum entity with possible related Topics:

from pony.orm import *

sql_debug(True)
db = Database('sqlite', ':memory:')

class Forum(db.Entity):
    topics = Set('Topic')

class Topic(db.Entity):
    forum = Required('Forum')

db.generate_mapping(create_tables=True)

with db_session:
    # Forum with a topic
    forum_a = Forum()
    Topic(forum=forum_a)

    # Another forum with a topic
    forum_b = Forum()
    Topic(forum=forum_b)

    # Forum with no topics
    forum_c = Forum()

with db_session:
    forums = select(forum for forum in Forum)

    # Print forums and their topics
    for forum in forums:
        print(forum, forum.topics)

I see the following queries

SELECT "forum"."id"
FROM "Forum" "forum"

SELECT "id", "forum"
FROM "Topic"
WHERE "forum" = ?
[1]

SELECT "id", "forum"
FROM "Topic"
WHERE "forum" IN (?, ?)
[2, 3]

Forum[1] TopicSet([Topic[1]])
Forum[2] TopicSet([Topic[2]])
Forum[3] TopicSet([])

I assume there are 2 extra Topic queries to get-related because you don't know the first time if I am going to ask for related topics from other Forums, but when pony hits the second forum.topics, it goes ahead and loads the rest (for [2, 3])?

If so, I figured I could add a .prefetch() then to tell Pony to just load Topics for all 3 Forums one query:

        forums = select(forum for forum in Forum).prefetch(Forum.topics)

and expected to see just one extra query, like

SELECT "id", "forum"
FROM "Topic"
WHERE "forum" IN (?, ?)
[1, 2, 3]

but instead see all of these queries now

SELECT "forum"."id"
FROM "Forum" "forum"

SELECT "id"
FROM "Forum"
WHERE "id" IN (?, ?, ?)
[1, 2, 3]

SELECT "id", "forum"
FROM "Topic"
WHERE "forum" = ?
[1]

SELECT "id"
FROM "Forum"
WHERE "id" IN (?, ?, ?)
[2, 1, 3]

SELECT "id", "forum"
FROM "Topic"
WHERE "forum" IN (?, ?)
[2, 3]

SELECT "id"
FROM "Forum"
WHERE "id" IN (?, ?, ?)
[3, 2, 1]

At least some of those are logically duplicates, so I'm not sure what's happening there. After further testing, if I add a single field into the Forum model (without even using it), like this:

class Forum(db.Entity):
    topics = Set('Topic')
    name = Optional(str)

The queries go back to the first example (loading [1], then [2, 3]). Adding the field to Topic doesn't fix the issue. In any cases I couldn't get it to prefetch all related with a single query.

Further, I have a second (and possibly related) issue. Setup is similar, except instead of Forum -> Topic, I have nested forums, so Forum -> Forum:

from pony.orm import *

sql_debug(True)
db = Database('sqlite', ':memory:')

class Forum(db.Entity):
    parent = Optional('Forum')
    children = Set('Forum')

db.generate_mapping(create_tables=True)

with db_session:
    # Forum with a child forum
    forum_a = Forum()
    Forum(parent=forum_a)

    # Another forum with a child forum
    forum_b = Forum()
    Forum(parent=forum_b)

    # Forum with no children
    forum_c = Forum()

with db_session:
    # Select only "top level" forums without a parent
    forums = select(forum for forum in Forum if forum.parent is None)

    # Print top level forums and their children
    for forum in forums:
        print(forum, forum.children)

And I see these queries

SELECT "forum"."id", "forum"."parent"
FROM "Forum" "forum"
WHERE "forum"."parent" IS NULL

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" = ?
[1]

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" IN (?, ?, ?)
[3, 2, 5]

Forum[1] ForumSet([Forum[2]])
Forum[3] ForumSet([Forum[4]])
Forum[5] ForumSet([])

Again 2 extra queries (without using prefetch), but in [3, 2, 5]: why is pony fetching sub-forums for 2 when Forum[2] is not in the initial results (since Forum[2].parent is not None)? I added .prefetch() to see what would happen

forums = select(forum for forum in Forum if forum.parent is None).prefetch(Forum.children)

and get:

SELECT "forum"."id", "forum"."parent"
FROM "Forum" "forum"
WHERE "forum"."parent" IS NULL

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" = ?
[1]

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" IN (?, ?, ?)
[3, 2, 5]

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" = ?
[4]

So now not only Forum[2] but Forum[4] children is being pre-fetched, when I expect a single extra query with just

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" IN (?, ?, ?)
[1, 3, 5]

@kozlovsky kozlovsky self-assigned this Nov 20, 2014

@kozlovsky kozlovsky added this to the 0.6.1 milestone Nov 20, 2014

@kozlovsky kozlovsky added the bug label Nov 20, 2014

@kozlovsky kozlovsky closed this in 469894b Nov 21, 2014

@kozlovsky

This comment has been minimized.

Show comment
Hide comment
@kozlovsky

kozlovsky Nov 21, 2014

Member

I assume there are 2 extra Topic queries to get-related because you don't know the first time if I am going to ask for related topics from other Forums, but when pony hits the second forum.topics, it goes ahead and loads the rest (for [2, 3])?

Yes, exactly.

but instead see all of these queries now

Thanks for the reporting, these extra queries were caused by a bug. Roughly speaking, after the retrieving of just primary key column Pony does not believed that this was the entire object and tries to fetch it again. Now this behavior should be fixed, and no object should be fetched more then once.

I couldn't get it to prefetch all related with a single query

When the prefetch method was introduced, its purpose was not to reduce the number of queries. It was added to allow using of ORM objects outside of their db_session. Personally I think that it is better to always use ORM objects within the corresponding db_session. For example, when these objects are used to generate some HTML template, the generation should be done inside the db_session. But some people requested the possibility to use objects after their db_session has ended. In this case, we need to be sure that all necessary objects are loaded with all necessary sub-objects and relations. The prefetch method allows to guarantee this.

It is possible to optimize logic of prefetch method in the future in order to produce more effective queries, but currently prefetch just walk recursively and load all specified relations, using standard optimization heuristics which Pony uses for typical queries which user code generates.

So now not only Forum[2] but Forum[4] children is being pre-fetched

It is because the prefetching goes recursively. When you call .prefetch(Forum.children) that means that this collection should be loaded for all objects, in order to guarantee that for any object this attribute can be accessible after the db_session has ended. Actually this logic goes wrong for recursive relations, because this way all table rows can be pulled unexpectedly, but I just don't come up with a good API for this case yet. In some cases it is desirable to prefetch not only the top level of objects, but also its related sub-objects, and so forth.

Anyway, now after the major bug in query logic was fixed the number of queries should be much lower then you initially experienced even without using of prefetching. Tell me what do you think, and thanks again for the reporting!

Member

kozlovsky commented Nov 21, 2014

I assume there are 2 extra Topic queries to get-related because you don't know the first time if I am going to ask for related topics from other Forums, but when pony hits the second forum.topics, it goes ahead and loads the rest (for [2, 3])?

Yes, exactly.

but instead see all of these queries now

Thanks for the reporting, these extra queries were caused by a bug. Roughly speaking, after the retrieving of just primary key column Pony does not believed that this was the entire object and tries to fetch it again. Now this behavior should be fixed, and no object should be fetched more then once.

I couldn't get it to prefetch all related with a single query

When the prefetch method was introduced, its purpose was not to reduce the number of queries. It was added to allow using of ORM objects outside of their db_session. Personally I think that it is better to always use ORM objects within the corresponding db_session. For example, when these objects are used to generate some HTML template, the generation should be done inside the db_session. But some people requested the possibility to use objects after their db_session has ended. In this case, we need to be sure that all necessary objects are loaded with all necessary sub-objects and relations. The prefetch method allows to guarantee this.

It is possible to optimize logic of prefetch method in the future in order to produce more effective queries, but currently prefetch just walk recursively and load all specified relations, using standard optimization heuristics which Pony uses for typical queries which user code generates.

So now not only Forum[2] but Forum[4] children is being pre-fetched

It is because the prefetching goes recursively. When you call .prefetch(Forum.children) that means that this collection should be loaded for all objects, in order to guarantee that for any object this attribute can be accessible after the db_session has ended. Actually this logic goes wrong for recursive relations, because this way all table rows can be pulled unexpectedly, but I just don't come up with a good API for this case yet. In some cases it is desirable to prefetch not only the top level of objects, but also its related sub-objects, and so forth.

Anyway, now after the major bug in query logic was fixed the number of queries should be much lower then you initially experienced even without using of prefetching. Tell me what do you think, and thanks again for the reporting!

@jasonmyers

This comment has been minimized.

Show comment
Hide comment
@jasonmyers

jasonmyers Nov 21, 2014

Contributor

Thanks! The main bug appears fixed.

I created a couple of patches to reduce the queries further in my specific situation. I don't expect you to merge these necessarily, but here they are in any case (tests are passing, added a couple):

  1. https://github.com/jasonmyers/pony/compare/prefetch-all
    Bypass nplus1_threshold when prefetching, which combines the [1], [2, 3] queries into a single [1, 2, 3] query when using .prefetch()

  2. https://github.com/jasonmyers/pony/compare/prefetch-norecursive
    Add recursive flag to .prefetch() (defaults True), which if False will only prefetch one level deep for recursive objects

  3. https://github.com/jasonmyers/pony/compare/prefetch-all-norecursive
    Both of the above patches combined

Example output, given the same script as before:

from pony.orm import *
sql_debug(True)
db = Database('sqlite', ':memory:')

class Forum(db.Entity):
    parent = Optional('Forum')
    children = Set('Forum')

db.generate_mapping(create_tables=True)

with db_session:
    # Forum with a child forum
    forum_a = Forum()
    forum_aa = Forum(parent=forum_a)

    # Another forum with a child forum
    forum_b = Forum()
    forum_bb = Forum(parent=forum_b)

    # Forum with no children
    forum_c = Forum()

with db_session:
    # Select all "top level" forums without a parent
    forums = select(forum for forum in Forum if forum.parent is None).prefetch(Forum.children)
    # Print forums and their topics
    for forum in forums:
        print(forum, forum.children)

Without patches:

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" = ?
[1]

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" IN (?, ?, ?)
[3, 2, 5]

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" = ?
[4]

With prefetch-all:

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" IN (?, ?, ?)
[1, 3, 5]

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" IN (?, ?)
[2, 4]

With prefetch-all-norecursive and .prefetch(Forum.children, recursive=False):

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" IN (?, ?, ?)
[1, 3, 5]

(These patches only affect queries that use prefetch() as a hint for reduced queries, and don't apply if .prefetch() isn't used)

Contributor

jasonmyers commented Nov 21, 2014

Thanks! The main bug appears fixed.

I created a couple of patches to reduce the queries further in my specific situation. I don't expect you to merge these necessarily, but here they are in any case (tests are passing, added a couple):

  1. https://github.com/jasonmyers/pony/compare/prefetch-all
    Bypass nplus1_threshold when prefetching, which combines the [1], [2, 3] queries into a single [1, 2, 3] query when using .prefetch()

  2. https://github.com/jasonmyers/pony/compare/prefetch-norecursive
    Add recursive flag to .prefetch() (defaults True), which if False will only prefetch one level deep for recursive objects

  3. https://github.com/jasonmyers/pony/compare/prefetch-all-norecursive
    Both of the above patches combined

Example output, given the same script as before:

from pony.orm import *
sql_debug(True)
db = Database('sqlite', ':memory:')

class Forum(db.Entity):
    parent = Optional('Forum')
    children = Set('Forum')

db.generate_mapping(create_tables=True)

with db_session:
    # Forum with a child forum
    forum_a = Forum()
    forum_aa = Forum(parent=forum_a)

    # Another forum with a child forum
    forum_b = Forum()
    forum_bb = Forum(parent=forum_b)

    # Forum with no children
    forum_c = Forum()

with db_session:
    # Select all "top level" forums without a parent
    forums = select(forum for forum in Forum if forum.parent is None).prefetch(Forum.children)
    # Print forums and their topics
    for forum in forums:
        print(forum, forum.children)

Without patches:

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" = ?
[1]

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" IN (?, ?, ?)
[3, 2, 5]

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" = ?
[4]

With prefetch-all:

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" IN (?, ?, ?)
[1, 3, 5]

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" IN (?, ?)
[2, 4]

With prefetch-all-norecursive and .prefetch(Forum.children, recursive=False):

SELECT "id", "parent"
FROM "Forum"
WHERE "parent" IN (?, ?, ?)
[1, 3, 5]

(These patches only affect queries that use prefetch() as a hint for reduced queries, and don't apply if .prefetch() isn't used)

@kozlovsky

This comment has been minimized.

Show comment
Hide comment
@kozlovsky

kozlovsky Nov 21, 2014

Member

Thanks, the patches looks interesting, I'll think a bit and most likely merge them

Member

kozlovsky commented Nov 21, 2014

Thanks, the patches looks interesting, I'll think a bit and most likely merge them

@kozlovsky kozlovsky reopened this Nov 21, 2014

@kozlovsky kozlovsky closed this in 5282b2d Feb 20, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment