Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive queries on load (nested schemas) #106

Closed
xarg opened this issue Mar 10, 2017 · 3 comments

Comments

@xarg
Copy link

commented Mar 10, 2017

We had this issue for a long time, but now it's starting to become really bad for us. See the example below:

from flask import Flask
from flask_sqlalchemy import SQLAlchemy, get_debug_queries
from marshmallow import fields
from marshmallow_sqlalchemy import ModelSchema
from datetime import datetime

app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////tmp/test.db'
app.config['SQLALCHEMY_ECHO'] = False
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = True
app.config['SQLALCHEMY_RECORD_QUERIES'] = True
db = SQLAlchemy(app)


class User(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    username = db.Column(db.String(80), unique=True)
    email = db.Column(db.String(120), unique=True)
    created_datetime = db.Column(db.DateTime(), default=datetime.utcnow)
    updated_datetime = db.Column(db.DateTime(), default=datetime.utcnow,
                                 onupdate=datetime.utcnow)

    def __repr__(self):
        return '<User %r>' % self.username


class URL(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    user_id = db.Column(db.Integer, db.ForeignKey(User.id))
    url = db.Column(db.String())

    user = db.relationship(User, backref=db.backref('urls'))

    def __repr__(self):
        return '<URL %r>' % self.url


def create_user():
    user1 = User(username='user1', email="user1@example.com")
    user1.urls.append(URL(url="http://a"))
    user1.urls.append(URL(url="http://b"))
    user2 = User(username='user2', email="user2@example.com")
    user2.urls.append(URL(url="http://c"))
    user2.urls.append(URL(url="http://d"))

    db.session.add(user1)
    db.session.add(user2)
    db.session.commit()


class UserSchema(ModelSchema):
    urls = fields.Nested('URLSchema', many=True)

    class Meta:
        model = User
        sqla_session = db.session


class URLSchema(ModelSchema):
    class Meta:
        model = URL
        sqla_session = db.session


user_schema = UserSchema()


@app.route("/")
def hello():
    user = User.query.options(db.joinedload('urls')).filter_by(id=1).first()
    user_dump = user_schema.dump(user)
    user_schema.load(user_dump.data, instance=user)
    for query in get_debug_queries():
        print("%s %r\n------" % (query.statement, query.parameters))

    return ''


if __name__ == "__main__":
    db.drop_all()
    db.create_all()
    create_user()
    app.run(port=9999, debug=True)

Results in:

SELECT anon_1.user_id AS anon_1_user_id, anon_1.user_username AS anon_1_user_username, anon_1.user_email AS anon_1_user_email, anon_1.user_created_datetime AS anon_1_user_created_datetime, anon_1.user_updated_datetime AS anon_1_user_updated_datetime, "URL_1".id AS "URL_1_id", "URL_1".user_id AS "URL_1_user_id", "URL_1".url AS "URL_1_url"
FROM (SELECT user.id AS user_id, user.username AS user_username, user.email AS user_email, user.created_datetime AS user_created_datetime, user.updated_datetime AS user_updated_datetime
FROM user
WHERE user.id = ?
 LIMIT ? OFFSET ?) AS anon_1 LEFT OUTER JOIN "URL" AS "URL_1" ON anon_1.user_id = "URL_1".user_id (1, 1, 0)
------
SELECT user.id AS user_id, user.username AS user_username, user.email AS user_email, user.created_datetime AS user_created_datetime, user.updated_datetime AS user_updated_datetime
FROM user
WHERE user.id = ? (1,)
------
SELECT user.id AS user_id, user.username AS user_username, user.email AS user_email, user.created_datetime AS user_created_datetime, user.updated_datetime AS user_updated_datetime
FROM user
WHERE user.id = ? (1,)
------
SELECT "URL".id AS "URL_id", "URL".user_id AS "URL_user_id", "URL".url AS "URL_url"
FROM "URL"
WHERE "URL".id = ?
 LIMIT ? OFFSET ? (1, 1, 0)
------
SELECT "URL".id AS "URL_id", "URL".user_id AS "URL_user_id", "URL".url AS "URL_url"
FROM "URL"
WHERE "URL".id = ?
 LIMIT ? OFFSET ? (2, 1, 0)
------
127.0.0.1 - - [10/Mar/2017 10:32:47] "GET / HTTP/1.1" 200 -

It's a bit hard to parse but you might notice that the dump part is 1 query with an OUTER JOIN which is exactly what I expected, however when loading with an attached instance it performs these additional queries.

I think I understand why this happens, you need this because you might need to do an update, but if the instance is given with all the associated objects, the queries above are not really need.

In an ideal world I would expect: No queries to the DB when doing dump/load when instance is provided.

Is there a way to have no side effects when doing load? This is a constant source of big bugs for us.

@xarg

This comment has been minimized.

Copy link
Author

commented Mar 20, 2017

I did some more digging on this and it led me to: https://github.com/marshmallow-code/marshmallow-sqlalchemy/blob/dev/marshmallow_sqlalchemy/fields.py#L82

What happens here is that a RelatedField on deserialization always selects from the DB no matter what. The instance provided here (in the above example) user_schema.load(user_dump.data, instance=user) is not used. In any case it's not used for related fields.

Is it possible to make use of the instance in RelatedField deserialization?

@sloria

This comment has been minimized.

Copy link
Member

commented Mar 21, 2017

Thanks for reporting this and posting your findings. I don't have time to look into this deeply at the moment, but I would gladly review and merge a PR to avoid unnecessary querying.

@jmuhlich

This comment has been minimized.

Copy link
Contributor

commented Mar 31, 2017

I just ran into this same issue. I submitted #107 which fixes the problem for Related fields on primary keys. Running your example with this change produces no extraneous queries against user:

SELECT anon_1.user_id AS anon_1_user_id, anon_1.user_username AS anon_1_user_username, anon_1.user_email AS anon_1_user_email, anon_1.user_created_datetime AS anon_1_user_created_datetime, anon_1.user_updated_datetime AS anon_1_user_updated_datetime, "URL_1".id AS "URL_1_id", "URL_1".user_id AS "URL_1_user_id", "URL_1".url AS "URL_1_url" 
FROM (SELECT user.id AS user_id, user.username AS user_username, user.email AS user_email, user.created_datetime AS user_created_datetime, user.updated_datetime AS user_updated_datetime 
FROM user 
WHERE user.id = ?
 LIMIT ? OFFSET ?) AS anon_1 LEFT OUTER JOIN "URL" AS "URL_1" ON anon_1.user_id = "URL_1".user_id (1, 1, 0)
------
SELECT "URL".id AS "URL_id", "URL".user_id AS "URL_user_id", "URL".url AS "URL_url" 
FROM "URL" 
WHERE "URL".id = ?
 LIMIT ? OFFSET ? (1, 1, 0)
------
SELECT "URL".id AS "URL_id", "URL".user_id AS "URL_user_id", "URL".url AS "URL_url" 
FROM "URL" 
WHERE "URL".id = ?
 LIMIT ? OFFSET ? (2, 1, 0)
------
127.0.0.1 - - [31/Mar/2017 16:25:56] "GET / HTTP/1.1" 200 -

@sloria sloria closed this Apr 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.