Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow response during pagination for large datasets #286

Closed
deepanshumehndiratta opened this issue Feb 11, 2014 · 4 comments
Closed

Very slow response during pagination for large datasets #286

deepanshumehndiratta opened this issue Feb 11, 2014 · 4 comments

Comments

@deepanshumehndiratta
Copy link
Contributor

@deepanshumehndiratta deepanshumehndiratta commented Feb 11, 2014

I'm experiencing this issue where it takes approx. 3-30 seconds for a GET query on an API endpoint (Without any Pre or Post processors attached to it). The format of the result is attached below: (81399 records, 10 per page, 8140 pages)

Structure of model:

class TimeSeries(db.Model, ValidationMixin):
  id = db.Column(db.Integer, primary_key=True)
  channel_id = db.Column(db.Integer, db.ForeignKey('channel.id'))
  sensor_id = db.Column(db.Integer, db.ForeignKey('sensor.id'))
  time = db.Column(db.DateTime)
  value1 = db.Column(db.Unicode(255))
  value2 = db.Column(db.Unicode(255))
  value3 = db.Column(db.Unicode(255))
  value4 = db.Column(db.Unicode(255))

  # will validate nullability and string types
  val.validates_constraints()

petasense com api configuration time_series

The Developer Tools request time:

screenshot from 2014-02-11 23 39 23

I suspect this is because of the rel="last" in the Link header.

Link:<http://hostname/time_series?page=2&results_per_page=10>; rel="next", <http://hostname/time_series?page=8140&results_per_page=10>; rel="last"

Could it be because of this issue? Stackoverflow: Why is SQLAlchemy count() much slower than the raw query?

Edit:

A count query on the SQLAlchemy Model of the application highlighting the same:

screenshot from 2014-02-12 00 03 05

@andir
Copy link
Contributor

@andir andir commented Feb 11, 2014

Well it seems like you've fixed your issue while I was trying to give a hint on how to figure out whats the slow part.
In your fix you should probably be using the primary_key_name function as provided at https://github.com/jfinkels/flask-restless/blob/master/flask_restless/helpers.py#L188 otherwise things might go wrong (as the failing unit tests show ;) )

Would still be interested in how much you could gain from that simple fix you proposed.

Here is my written answer in case it's helpful to anyone in the future:

I've experienced the same issues with .count() and being slower. A much higher penalty (at least in my environments) was the serialization done in to_dict().
You could try enabling the sqlalchemy query logging to figure out how long the individual queries are running.

you can do so by

    app.config['SQLALCHEMY_RECORD_QUERIES'] = True

    @app.after_request
    def queries(r):
        from flask.ext.sqlalchemy import get_debug_queries

        print '\n'.join(map(str, get_debug_queries()))
        return r

Also using the Profiler Middleware from werkzeug will tell you which parts of your code are the bottleneck:

from werkzeug.contrib.profiler import ProfilerMiddleware
app.wsgi_app = ProfilerMiddleware(app.wsgi_app)

Loading

@deepanshumehndiratta
Copy link
Contributor Author

@deepanshumehndiratta deepanshumehndiratta commented Feb 12, 2014

Even after using the primary_key_name function, the build tests fail. I'll have to look into the build logs to isolate the bug.

It took anywhere between 3 - 30 seconds for a standalone API call on that particular model before the temporary fix. However now I haven't seen it go above 400 ms.

Loading

@deepanshumehndiratta
Copy link
Contributor Author

@deepanshumehndiratta deepanshumehndiratta commented Feb 12, 2014

Isolated and fixed the bug. It passes all the build tests now. You can review and pull the code.

Loading

@jfinkels
Copy link
Owner

@jfinkels jfinkels commented Mar 12, 2014

I have added your optimization. Thanks.

Loading

@jfinkels jfinkels closed this Mar 12, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants