Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization: paginate total count #281

Closed
wants to merge 1 commit into from
Closed

Conversation

lyoe
Copy link

@lyoe lyoe commented Apr 22, 2015

No description provided.

@davidism
Copy link
Member

I don't think this actually improves performance, count is just generally slow and may already be optimized by the db. Would you add some benchmarks for at least PostgreSQL, MySQL, and SQLite that demonstrate this being a better implementation?

@lyoe
Copy link
Author

lyoe commented Apr 23, 2015

@davidism

self.order_by(None).count() 

generates SQL like this: select count(*) from (select abc, def from User) as u
But we just expect select count(*) from User

@lyoe
Copy link
Author

lyoe commented Apr 23, 2015

self.order_by(None).count() will use temporary table in MySQL. It must be slower than direct selection.

@davidism
Copy link
Member

I did an informal benchmark on a PostgreSQL system I have at work. The new query either performed as well or worse than the current query. I understand that the rendered statements are different, but I'm skeptical that the second performs better without some numbers to back it up.

@immunda
Copy link
Contributor

immunda commented Apr 23, 2015

Thanks @davidism. Significantly worse or within the margin of error?

@doublefloat Could you provide some numbers on a few different backends?

@lyoe
Copy link
Author

lyoe commented Apr 24, 2015

I made a test for each query on MySQL:

import suite

from sqlalchemy import func
from application.models import db
from application.models.user import User
import datetime

class TestCount(suite.BaseSuite):
    def test_count(self):
        with self.app.app_context():
            for i in range(10000):
                user = User(name="name_%s" % i, password='password')
                db.session.add(user)
                if i % 200 == 0:
                    db.session.commit()

            db.session.commit()

        with self.app.app_context():
            start = datetime.datetime.now()
            for i in range(1000):
                db.session.query(User).count()

            end = datetime.datetime.now()

            # 22.756345 sec
            print((end - start).total_seconds())

            start = datetime.datetime.now()
            query = db.session.query(User)
            for i in range(1000):
                db.session.execute(query.statement.with_only_columns([func.count(User.id)]).order_by(None)).scalar()

            end = datetime.datetime.now()

            # 5.668808 sec
            print((end - start).total_seconds())

First 1000 queris performed in 22.76 seconds, and the second 1000 queries in 5.67 seconds.

@RonnyPfannschmidt
Copy link

i have a a hunch that this optimization will break on some of the more complex queries that sqlalchemy allows - (details like combining aggregations with having clauses)

@davidism
Copy link
Member

ping @zzzeek: can you see any consequences to changing the query in this way?

@zzzeek
Copy link

zzzeek commented Apr 24, 2015

I'd skip the session.execute(self.statement) aspect of it for sure, because it's unnecessary and you lose mapping information that is critical for session.get_bind(), and at least just say query.order_by(None).value(func.count()). but also yes, the reason we do the subquery dance is because sometimes you need that subquery - for a long time we tried doing the query you see here when possible but as the edge cases kept coming in over the years, it eventually became not worth it to continue to guess when to subquery and when not to in all cases. but you'd have to look up the history of this change to find those breakages, it's old stuff at this point.

@davidism
Copy link
Member

Perhaps we add a without_subquery=False flag to paginate, with documentation warning that it might break certain complex queries when enabled. Then it's up to the developer to construct their query appropriately and decide to enable the optimization.

@immunda
Copy link
Contributor

immunda commented Apr 24, 2015

Goodness. I'm tempted to break the pagination API in v3 including a Pagination class with a count method which could be overridden.

@davidism
Copy link
Member

I agree, we should improve the pagination handling in general. Extracting the pagination functions has already been brought up in #265.

There are some open issues regarding pagination that could probably be handled at the same time. This issue also duplicates #272.

@immunda
Copy link
Contributor

immunda commented Apr 24, 2015

Let's do it. I'll make a new issue for doing this properly and close up those.

@lyoe
Copy link
Author

lyoe commented Apr 25, 2015

@zzzeek Thank you for your information. It's thoughtless of me to change it.

@immunda immunda closed this Apr 29, 2015
@immunda
Copy link
Contributor

immunda commented Apr 29, 2015

No problem, thanks for taking the time to explore this anyway.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants