New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow response during pagination for large datasets #286

deepanshumehndiratta opened this Issue Feb 11, 2014 · 4 comments


None yet
3 participants

deepanshumehndiratta commented Feb 11, 2014

I'm experiencing this issue where it takes approx. 3-30 seconds for a GET query on an API endpoint (Without any Pre or Post processors attached to it). The format of the result is attached below: (81399 records, 10 per page, 8140 pages)

Structure of model:

class TimeSeries(db.Model, ValidationMixin):
  id = db.Column(db.Integer, primary_key=True)
  channel_id = db.Column(db.Integer, db.ForeignKey(''))
  sensor_id = db.Column(db.Integer, db.ForeignKey(''))
  time = db.Column(db.DateTime)
  value1 = db.Column(db.Unicode(255))
  value2 = db.Column(db.Unicode(255))
  value3 = db.Column(db.Unicode(255))
  value4 = db.Column(db.Unicode(255))

  # will validate nullability and string types

petasense com api configuration time_series

The Developer Tools request time:

screenshot from 2014-02-11 23 39 23

I suspect this is because of the rel="last" in the Link header.

Link:<http://hostname/time_series?page=2&results_per_page=10>; rel="next", <http://hostname/time_series?page=8140&results_per_page=10>; rel="last"

Could it be because of this issue? Stackoverflow: Why is SQLAlchemy count() much slower than the raw query?


A count query on the SQLAlchemy Model of the application highlighting the same:

screenshot from 2014-02-12 00 03 05


This comment has been minimized.


andir commented Feb 11, 2014

Well it seems like you've fixed your issue while I was trying to give a hint on how to figure out whats the slow part.
In your fix you should probably be using the primary_key_name function as provided at otherwise things might go wrong (as the failing unit tests show ;) )

Would still be interested in how much you could gain from that simple fix you proposed.

Here is my written answer in case it's helpful to anyone in the future:

I've experienced the same issues with .count() and being slower. A much higher penalty (at least in my environments) was the serialization done in to_dict().
You could try enabling the sqlalchemy query logging to figure out how long the individual queries are running.

you can do so by

    app.config['SQLALCHEMY_RECORD_QUERIES'] = True

    def queries(r):
        from flask.ext.sqlalchemy import get_debug_queries

        print '\n'.join(map(str, get_debug_queries()))
        return r

Also using the Profiler Middleware from werkzeug will tell you which parts of your code are the bottleneck:

from werkzeug.contrib.profiler import ProfilerMiddleware
app.wsgi_app = ProfilerMiddleware(app.wsgi_app)

This comment has been minimized.


deepanshumehndiratta commented Feb 12, 2014

Even after using the primary_key_name function, the build tests fail. I'll have to look into the build logs to isolate the bug.

It took anywhere between 3 - 30 seconds for a standalone API call on that particular model before the temporary fix. However now I haven't seen it go above 400 ms.


This comment has been minimized.


deepanshumehndiratta commented Feb 12, 2014

Isolated and fixed the bug. It passes all the build tests now. You can review and pull the code.


This comment has been minimized.


jfinkels commented Mar 12, 2014

I have added your optimization. Thanks.

@jfinkels jfinkels closed this Mar 12, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment