Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache key calc for loader options w/ reuse has performance issue #4270

Closed
sqlalchemy-bot opened this issue Jun 6, 2018 · 6 comments
Closed
Labels
bug Something isn't working orm
Milestone

Comments

@sqlalchemy-bot
Copy link
Collaborator

Migrated issue, originally created by Michael Bayer (@zzzeek)

copied from the mailing list

We have just upgraded to sqlalchemy 1.2.7 (from 1.1.14), and had a performance issue with a query that uses a lot of joinedloads that was caused by the automatic baking of all relationship queries that was introduced in 1.2.

Say we have a set of tables with relationships Book.pages, Page.font and Font.layout. We have a query of this form:

pages = joinedload(Book.pages)
option1 = pages.joinedload(Page.font)
option2 = pages.joinedload(Page.layout)

query = session().query(Book).options(option1, option2)

The important point here is that the pages object defined on line 1 is reused in both option1 and option2. Now suppose we fetch another relationship that wasn't joined-loaded on the returned instances. This will case another query as it is not loaded already, and this query will be baked due to the change in 1.2 to bake all relationship loads.

We found that the construction of the cache key for baking this query becomes very slow as the number of options of this form increases, and is in fact quadratic in the number of such options (we have ~25 such options in our problematic query). This is due to each option containing all of the joinedloads inside its _to_bind attribute, and _UnboundLoad._generate_cache_key has to process everything in the _to_bind list. E.g. in this example:

print([[str(i) for i in load.path] for load in option1._to_bind])
print([[str(i) for i in load.path] for load in option2._to_bind])

[['Book.pages'], ['Book.pages', 'Page.font'], ['Book.pages', 'Page.layout']]

[['Book.pages'], ['Book.pages', 'Page.font'], ['Book.pages', 'Page.layout']]

Therefore, when generating the key for each option we are processing the joinedloads from all of the options, leading to the quadratic performance degradation.

We fixed it by avoiding reusing the joinedload for Book.pages by doing this:

option1 = joinedload(Book.pages).joinedload(Page.font)
option2 = joinedload(Book.pages).joinedload(Page.layout)

The resulting query is unchanged, but the cache key function is now just linear in the number of joinedloads as each option has only its relationships in its _to_bind attribute. In our case, this completely solved the performance issue.

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

here's a test script too!

from sqlalchemy import Column, ForeignKey, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import joinedload, relationship

Base = declarative_base()

class Book(Base):
    __tablename__ = 'book'
    id = Column(Integer, primary_key=True)
    pages = relationship('Page', backref='book')

class Font(Base):
    __tablename__ = 'font'
    id = Column(Integer, primary_key=True)

class Layout(Base):
    __tablename__ = 'layout'
    id = Column(Integer, primary_key=True)


class Page(Base):
    __tablename__ = 'page'
    id = Column(Integer, primary_key=True)
    book_id = Column(ForeignKey('book.id'))
    font_id = Column(ForeignKey('font.id'))
    layout_id = Column(ForeignKey('layout.id'))

    font = relationship(Font)
    layout = relationship(Layout)


pages = joinedload(Book.pages)
option1 = pages.joinedload(Page.font)
option2 = pages.joinedload(Page.layout)

print([[str(i) for i in load.path] for load in option1._to_bind])
print([[str(i) for i in load.path] for load in option2._to_bind])

option3 = joinedload(Book.pages).joinedload(Page.font)
option4 = joinedload(Book.pages).joinedload(Page.layout)

print([[str(i) for i in load.path] for load in option3._to_bind])
print([[str(i) for i in load.path] for load in option4._to_bind])

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

in this use case I wonder if even without the cache key if we are generating redundant Load options that are applied to the query.

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

reviews going through at https://gerrit.sqlalchemy.org/#/q/I955fe2f50186abd8e753ad490fd3eb8f017e26f9 which also include a separate fix for the re-application of loader options issue.

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

Iterate options per path for baked cache key

Fixed an issue that was both a performance regression in 1.2 as well as an
incorrect result regarding the "baked" lazy loader, involving the
generation of cache keys from the original :class:.Query object's loader
options. If the loader options were built up in a "branched" style using
common base elements for multiple options, the same options would be
rendered into the cache key repeatedly, causing both a performance issue as
well as generating the wrong cache key. This is fixed, along with a
performance improvement when such "branched" options are applied via
:meth:.Query.options to prevent the same option objects from being
applied repeatedly.

Change-Id: I955fe2f50186abd8e753ad490fd3eb8f017e26f9
Fixes: #4270

006da86

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

Iterate options per path for baked cache key

Fixed an issue that was both a performance regression in 1.2 as well as an
incorrect result regarding the "baked" lazy loader, involving the
generation of cache keys from the original :class:.Query object's loader
options. If the loader options were built up in a "branched" style using
common base elements for multiple options, the same options would be
rendered into the cache key repeatedly, causing both a performance issue as
well as generating the wrong cache key. This is fixed, along with a
performance improvement when such "branched" options are applied via
:meth:.Query.options to prevent the same option objects from being
applied repeatedly.

Change-Id: I955fe2f50186abd8e753ad490fd3eb8f017e26f9
Fixes: #4270
(cherry picked from commit 006da86)

99f1239

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed status to closed

@sqlalchemy-bot sqlalchemy-bot added bug Something isn't working orm labels Nov 27, 2018
@sqlalchemy-bot sqlalchemy-bot added this to the 1.2.x milestone Nov 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working orm
Projects
None yet
Development

No branches or pull requests

1 participant