optimization: paginate total count #281

lyoe · 2015-04-22T05:46:06Z

No description provided.

davidism · 2015-04-22T15:35:59Z

I don't think this actually improves performance, count is just generally slow and may already be optimized by the db. Would you add some benchmarks for at least PostgreSQL, MySQL, and SQLite that demonstrate this being a better implementation?

lyoe · 2015-04-23T05:14:16Z

@davidism

self.order_by(None).count()

generates SQL like this: select count(*) from (select abc, def from User) as u
But we just expect select count(*) from User

lyoe · 2015-04-23T05:19:16Z

self.order_by(None).count() will use temporary table in MySQL. It must be slower than direct selection.

davidism · 2015-04-23T16:18:26Z

I did an informal benchmark on a PostgreSQL system I have at work. The new query either performed as well or worse than the current query. I understand that the rendered statements are different, but I'm skeptical that the second performs better without some numbers to back it up.

immunda · 2015-04-23T16:26:08Z

Thanks @davidism. Significantly worse or within the margin of error?

@doublefloat Could you provide some numbers on a few different backends?

lyoe · 2015-04-24T05:18:21Z

I made a test for each query on MySQL:

import suite

from sqlalchemy import func
from application.models import db
from application.models.user import User
import datetime

class TestCount(suite.BaseSuite):
    def test_count(self):
        with self.app.app_context():
            for i in range(10000):
                user = User(name="name_%s" % i, password='password')
                db.session.add(user)
                if i % 200 == 0:
                    db.session.commit()

            db.session.commit()

        with self.app.app_context():
            start = datetime.datetime.now()
            for i in range(1000):
                db.session.query(User).count()

            end = datetime.datetime.now()

            # 22.756345 sec
            print((end - start).total_seconds())

            start = datetime.datetime.now()
            query = db.session.query(User)
            for i in range(1000):
                db.session.execute(query.statement.with_only_columns([func.count(User.id)]).order_by(None)).scalar()

            end = datetime.datetime.now()

            # 5.668808 sec
            print((end - start).total_seconds())

First 1000 queris performed in 22.76 seconds, and the second 1000 queries in 5.67 seconds.

RonnyPfannschmidt · 2015-04-24T07:16:11Z

i have a a hunch that this optimization will break on some of the more complex queries that sqlalchemy allows - (details like combining aggregations with having clauses)

davidism · 2015-04-24T14:57:02Z

ping @zzzeek: can you see any consequences to changing the query in this way?

zzzeek · 2015-04-24T15:03:57Z

I'd skip the session.execute(self.statement) aspect of it for sure, because it's unnecessary and you lose mapping information that is critical for session.get_bind(), and at least just say query.order_by(None).value(func.count()). but also yes, the reason we do the subquery dance is because sometimes you need that subquery - for a long time we tried doing the query you see here when possible but as the edge cases kept coming in over the years, it eventually became not worth it to continue to guess when to subquery and when not to in all cases. but you'd have to look up the history of this change to find those breakages, it's old stuff at this point.

davidism · 2015-04-24T15:10:25Z

Perhaps we add a without_subquery=False flag to paginate, with documentation warning that it might break certain complex queries when enabled. Then it's up to the developer to construct their query appropriately and decide to enable the optimization.

immunda · 2015-04-24T15:21:17Z

Goodness. I'm tempted to break the pagination API in v3 including a Pagination class with a count method which could be overridden.

davidism · 2015-04-24T15:39:29Z

I agree, we should improve the pagination handling in general. Extracting the pagination functions has already been brought up in #265.

There are some open issues regarding pagination that could probably be handled at the same time. This issue also duplicates #272.

immunda · 2015-04-24T15:40:31Z

Let's do it. I'll make a new issue for doing this properly and close up those.

lyoe · 2015-04-25T09:24:45Z

@zzzeek Thank you for your information. It's thoughtless of me to change it.

immunda · 2015-04-29T10:07:31Z

No problem, thanks for taking the time to explore this anyway.

optimization: paginate total count

2e56105

This was referenced Apr 24, 2015

poor performance of pagination under large dataset #272

Closed

Revisit Pagination #282

Closed

immunda closed this Apr 29, 2015

davidism mentioned this pull request Nov 6, 2015

improve count speed in paginate #346

Closed

Quanr mentioned this pull request Jan 11, 2018

Better performance on Pagination().total #583

Closed

trollefson mentioned this pull request Mar 30, 2018

Optional count of total number of pages #613

Merged

github-actions bot locked as resolved and limited conversation to collaborators Dec 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimization: paginate total count #281

optimization: paginate total count #281

lyoe commented Apr 22, 2015

davidism commented Apr 22, 2015

lyoe commented Apr 23, 2015

lyoe commented Apr 23, 2015

davidism commented Apr 23, 2015

immunda commented Apr 23, 2015

lyoe commented Apr 24, 2015

RonnyPfannschmidt commented Apr 24, 2015

davidism commented Apr 24, 2015

zzzeek commented Apr 24, 2015

davidism commented Apr 24, 2015

immunda commented Apr 24, 2015

davidism commented Apr 24, 2015

immunda commented Apr 24, 2015

lyoe commented Apr 25, 2015

immunda commented Apr 29, 2015

optimization: paginate total count #281

optimization: paginate total count #281

Conversation

lyoe commented Apr 22, 2015

davidism commented Apr 22, 2015

lyoe commented Apr 23, 2015

lyoe commented Apr 23, 2015

davidism commented Apr 23, 2015

immunda commented Apr 23, 2015

lyoe commented Apr 24, 2015

RonnyPfannschmidt commented Apr 24, 2015

davidism commented Apr 24, 2015

zzzeek commented Apr 24, 2015

davidism commented Apr 24, 2015

immunda commented Apr 24, 2015

davidism commented Apr 24, 2015

immunda commented Apr 24, 2015

lyoe commented Apr 25, 2015

immunda commented Apr 29, 2015