Migrating from `joinedload_all` to Avoid Duplicate Joins in SQLAlchemy #11356

mutazzuhairi · 2024-05-05T12:25:07Z

mutazzuhairi
May 5, 2024

Hello everyone,

I'm facing a challenge with transitioning away from the joinedload_all function in SQLAlchemy, which has been deprecated. This function was key in my services for loading related objects without causing duplicate joins, and it was used extensively. Now, replacing joinedload_all with chained joinedload() requires checking each query to ensure there are no redundant joins, which is risky and impractical for me given the number of queries involved.

Here's a simple example of the issue:

Imagine I have this query:

query = session.query(Parent).join(Child).filter(Child.type == 'primary')

Then I want to load the school & teacher using joinedload_all. My code was like this:

# Current usage with joinedload_all (deprecated)
query = session.query(Parent).join(Child).options(
    joinedload_all('children.school'),
    joinedload_all('children.teacher')
).filter(Child.type == 'primary')

# Recommended approach with chained joinedload()
query = session.query(Parent).join(Child).options(
    joinedload('children').joinedload('school'),
    joinedload('children').joinedload('teacher')
).filter(Child.type == 'primary')

The issue is when 'children' is already joined manually, the new method might try to join 'children' again. This could lead to duplicate joins and is a significant concern as manually checking and updating every query is not feasible.

I'm looking for advice or solutions that could help manage these changes more effectively, or if anyone has developed utility functions that mimic the old joinedload_all behavior but without the risk of duplicate joins.

Any insights or shared experiences would be highly appreciated!

Answered by zzzeek

May 6, 2024

that's just one option cancelling the other out - an idiosyncrasy in 1.3 is causing the joinedload_all() version to place the contains_eager at a higher priority. reverse the order for both and put the contains_eager() last, so it overrides whatever is there for "bs":

q1 = s.query(A).join(B).options(joinedload_all("bs.cs"), joinedload_all("bs.ds")).options(contains_eager('bs'))

q2 = s.query(A).join(B).options(joinedload("bs").joinedload("cs"), joinedload("bs").joinedload("ds")).options(contains_eager('bs'))

now you get the same query.

Bigger picture you should not use two options that match the same path. if you want to ignore a path token, use defaultload() like this:

q2 = s.query(A).jo…

View full answer

zzzeek · 2024-05-05T13:48:34Z

zzzeek
May 5, 2024
Maintainer

hi -

edit: your joinedload_all() isn't really correct, it only accepts a single path at a time, not comma separated.

joinedload_all() has no additional functionality compared to chained joinedload() calls, it's just a wrapper for the individual calls. The replacement for joinedload_all() is to use individual joinedload() calls only. you would not add query.join() into the mix, that's an entirely separate thing. The second query you have, remove the join(), and that's the equivalent:

# a correct use of joinedload_all
query = session.query(Parent).options(joinedload_all('children.school'), joinedload_all('children.teacher'))

# *actual* Recommended approach with chained joinedload()
query = session.query(Parent).options(
    joinedload('children').joinedload('school'),
    joinedload('children').joinedload('teacher')
)

as for mixing joinedload and manual join() and avoiding redundant joins, see the section https://docs.sqlalchemy.org/en/20/orm/queryguide/relationships.html#routing-explicit-joins-statements-into-eagerly-loaded-collections

8 replies

mutazzuhairi May 6, 2024
Author

@zzzeek This could work in general, but in my case, I don't think so. Let me elaborate more. From my services, I use queries as methods with a dynamic joinloads parameter, like this:

class ChildRepository:
    @staticmethod
    def get_parents_by_child_id(child_id, joinloads=()):
        query = db_session.query(Parent).\
            join(Child, Child.parent_id == Parent.id). \
            filter(Child.id == child_id)
    
        for joinedload in joinloads:
            query = query.options(joinedload_all(joinedload))
    
        return query.all()

And here is an example about the usages from the code:

children = ChildRepository.get_children_by_id(parent_id, joinloads=("children.school", "children.teacher"))

Based on your answer, I should remove all of this dynamic behavior and make it static in my code, which I can't do since each method has a lot of usages, and each usage has different joinloads input. This is just one method from hundreds, and all of them are dynamic, and Joinedload_all was handling this perfectly by avoiding any additional joins in case they already exist even if there contains_eager related or not.

I haven't faced any issues with duplicate joins using joinedload_all before, even when I resend the children, and it's not contained eager. So my question here is, if there is no equivalent functionality, why was it deprecated?

This change has blocked me from updating all of my services! And I can't do anything about it for days.

zzzeek May 6, 2024
Maintainer

I dont know how to express here that joinedload_all() does nothing that joinedload() does not. it introduces no new functionality whatsoever, of any kind. joinedload_all() goes away is because the whole use of strings for loader options also goes away in 2.0, but we will leave that for a different issue.

Here is an equivalent joinedload_all() function using the dotted strings you have, based on joinedload()

def joinedload_all(name):
    opt = None
    for token in name.split("."):
        if opt is None:
            opt = joinedload(token)
        else:
           opt = opt.joinedload(token)
    return opt

That's literally all joinedload_all() has ever done. The ORM literally cannot tell based on an option if you've used joinedload_all() or joinedload().

Please run this program under SQLAlchemy 1.3. Note the rendered SQL is identical for joinedload_all() vs. joinedload():

# SQLAlchemy 1.3 only
from sqlalchemy import Column
from sqlalchemy import ForeignKey
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import joinedload
from sqlalchemy.orm import joinedload_all
from sqlalchemy.orm import relationship
from sqlalchemy.orm import Session

Base = declarative_base()


class A(Base):
    __tablename__ = 'a'

    id = Column(Integer, primary_key=True)
    data = Column(String)
    bs = relationship("B")


class B(Base):
    __tablename__ = 'b'
    id = Column(Integer, primary_key=True)
    a_id = Column(ForeignKey("a.id"))
    cs = relationship("C")
    ds = relationship("D")

class C(Base):
    __tablename__ = 'c'
    id = Column(Integer, primary_key=True)
    b_id = Column(ForeignKey("b.id"))

class D(Base):
    __tablename__ = 'd'
    id = Column(Integer, primary_key=True)
    b_id = Column(ForeignKey("b.id"))

s = Session()
q1 = s.query(A).join(B).options(joinedload_all("bs.cs"), joinedload_all("bs.ds"))

q2 = s.query(A).join(B).options(joinedload("bs").joinedload("cs"), joinedload("bs").joinedload("ds"))

print(q1)
print(q2)

assert str(q1) == str(q2)

note also that both SQL statements contain two joins to b, which is what you said you dont want:

SELECT a.id AS a_id, a.data AS a_data, c_1.id AS c_1_id, c_1.b_id AS c_1_b_id, d_1.id AS d_1_id, d_1.b_id AS d_1_b_id, b_1.id AS b_1_id, b_1.a_id AS b_1_a_id 
FROM a JOIN b ON a.id = b.a_id LEFT OUTER JOIN b AS b_1 ON a.id = b_1.a_id LEFT OUTER JOIN c AS c_1 ON b_1.id = c_1.b_id LEFT OUTER JOIN d AS d_1 ON b_1.id = d_1.b_id

SELECT a.id AS a_id, a.data AS a_data, c_1.id AS c_1_id, c_1.b_id AS c_1_b_id, d_1.id AS d_1_id, d_1.b_id AS d_1_b_id, b_1.id AS b_1_id, b_1.a_id AS b_1_a_id 
FROM a JOIN b ON a.id = b.a_id LEFT OUTER JOIN b AS b_1 ON a.id = b_1.a_id LEFT OUTER JOIN c AS c_1 ON b_1.id = c_1.b_id LEFT OUTER JOIN d AS d_1 ON b_1.id = d_1.b_id

here is the same program again for SQLAlchemy 1.4, using the adapted joinedload_all() function - same assertions, same SQL, no change

from sqlalchemy import Column
from sqlalchemy import ForeignKey
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import joinedload
# from sqlalchemy.orm import joinedload_all
from sqlalchemy.orm import relationship
from sqlalchemy.orm import Session

Base = declarative_base()


class A(Base):
    __tablename__ = 'a'

    id = Column(Integer, primary_key=True)
    data = Column(String)
    bs = relationship("B")


class B(Base):
    __tablename__ = 'b'
    id = Column(Integer, primary_key=True)
    a_id = Column(ForeignKey("a.id"))
    cs = relationship("C")
    ds = relationship("D")

class C(Base):
    __tablename__ = 'c'
    id = Column(Integer, primary_key=True)
    b_id = Column(ForeignKey("b.id"))

class D(Base):
    __tablename__ = 'd'
    id = Column(Integer, primary_key=True)
    b_id = Column(ForeignKey("b.id"))

def joinedload_all(name):
    opt = None
    for token in name.split("."):
        if opt is None:
            opt = joinedload(token)
        else:
           opt = opt.joinedload(token)
    return opt


s = Session()
q1 = s.query(A).join(B).options(joinedload_all("bs.cs"), joinedload_all("bs.ds"))

q2 = s.query(A).join(B).options(joinedload("bs").joinedload("cs"), joinedload("bs").joinedload("ds"))

print(q1)
print(q2)

assert str(q1) == str(q2)

output:

SELECT a.id AS a_id, a.data AS a_data, c_1.id AS c_1_id, c_1.b_id AS c_1_b_id, d_1.id AS d_1_id, d_1.b_id AS d_1_b_id, b_1.id AS b_1_id, b_1.a_id AS b_1_a_id 
FROM a JOIN b ON a.id = b.a_id LEFT OUTER JOIN b AS b_1 ON a.id = b_1.a_id LEFT OUTER JOIN c AS c_1 ON b_1.id = c_1.b_id LEFT OUTER JOIN d AS d_1 ON b_1.id = d_1.b_id


SELECT a.id AS a_id, a.data AS a_data, c_1.id AS c_1_id, c_1.b_id AS c_1_b_id, d_1.id AS d_1_id, d_1.b_id AS d_1_b_id, b_1.id AS b_1_id, b_1.a_id AS b_1_a_id 
FROM a JOIN b ON a.id = b.a_id LEFT OUTER JOIN b AS b_1 ON a.id = b_1.a_id LEFT OUTER JOIN c AS c_1 ON b_1.id = c_1.b_id LEFT OUTER JOIN d AS d_1 ON b_1.id = d_1.b_id

so please, look more closely at what your program is doing, thanks!

mutazzuhairi May 6, 2024
Author

@zzzeek I just added options(contains_eager('bs')) to both of them to make them identical with my cases. Can you please take a look at the result?

Here are the updated queries:

s = Session()
q1 = s.query(A).join(B).options(contains_eager('bs')).options(joinedload_all("bs.cs"), joinedload_all("bs.ds"))

q2 = s.query(A).join(B).options(contains_eager('bs')).options(joinedload("bs").joinedload("cs"), joinedload("bs").joinedload("ds"))

Here's the output:

SELECT b.id AS b_id, b.a_id AS b_a_id, a.id AS a_id, a.data AS a_data, c_1.id AS c_1_id, c_1.b_id AS c_1_b_id, d_1.id AS d_1_id, d_1.b_id AS d_1_b_id 
FROM a JOIN b ON a.id = b.a_id LEFT OUTER JOIN c AS c_1 ON b.id = c_1.b_id LEFT OUTER JOIN d AS d_1 ON b.id = d_1.b_id

SELECT a.id AS a_id, a.data AS a_data, c_1.id AS c_1_id, c_1.b_id AS c_1_b_id, d_1.id AS d_1_id, d_1.b_id AS d_1_b_id, b_1.id AS b_1_id, b_1.a_id AS b_1_a_id 
FROM a JOIN b ON a.id = b.a_id LEFT OUTER JOIN b AS b_1 ON a.id = b_1.a_id LEFT OUTER JOIN c AS c_1 ON b_1.id = c_1.b_id LEFT OUTER JOIN d AS d_1 ON b_1.id = d_1.b_id

You will notice that the first query does not have a duplicate join to B, but the second one does. How can they have the same behavior if adding contains_eager changes the behavior? Is there an idea on how to maintain the same behavior in this case?

I just checked all of my cases; it's related to this issue when I already added contains_eager and then added it again in joinedload_all.

zzzeek May 6, 2024
Maintainer

that's just one option cancelling the other out - an idiosyncrasy in 1.3 is causing the joinedload_all() version to place the contains_eager at a higher priority. reverse the order for both and put the contains_eager() last, so it overrides whatever is there for "bs":

q1 = s.query(A).join(B).options(joinedload_all("bs.cs"), joinedload_all("bs.ds")).options(contains_eager('bs'))

q2 = s.query(A).join(B).options(joinedload("bs").joinedload("cs"), joinedload("bs").joinedload("ds")).options(contains_eager('bs'))

now you get the same query.

Bigger picture you should not use two options that match the same path. if you want to ignore a path token, use defaultload() like this:

q2 = s.query(A).join(B).options(contains_eager('bs')).options(defaultload("bs").joinedload("cs"), defaultload("bs").joinedload("ds"))

Answer selected by mutazzuhairi

mutazzuhairi May 6, 2024
Author

Many Thanks @zzzeek,
It's clear now, all this is just about the order of "contains_eager" which I set to be last and everything is working as expected from my side.

Just a question is there any functionality or configurations that I can use to make the joinedload a higher priority than contains_eager like joinedload_all?
or I should put the joinedload to be in the last manually ?

zzzeek May 7, 2024
Maintainer

this is generally not an area there's much explicit API, the resolution order of conflicting options is probably not a strong guarantee right now, not sure if we have tests for that. it's certainly more doable as something we could make guaranteed at some point but if you want to keep up with SQLAlchemy versions you will have more migrations to do for that code.

mutazzuhairi · 2024-05-06T10:07:56Z

mutazzuhairi
May 6, 2024
Author

I updated the example on the question if you can check it again.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrating from `joinedload_all` to Avoid Duplicate Joins in SQLAlchemy #11356

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Migrating from joinedload_all to Avoid Duplicate Joins in SQLAlchemy #11356

mutazzuhairi May 5, 2024

Replies: 2 comments · 8 replies

zzzeek May 5, 2024 Maintainer

mutazzuhairi May 6, 2024 Author

zzzeek May 6, 2024 Maintainer

mutazzuhairi May 6, 2024 Author

zzzeek May 6, 2024 Maintainer

mutazzuhairi May 6, 2024 Author

zzzeek May 7, 2024 Maintainer

mutazzuhairi May 6, 2024 Author

Migrating from `joinedload_all` to Avoid Duplicate Joins in SQLAlchemy #11356

mutazzuhairi
May 5, 2024

Replies: 2 comments 8 replies

zzzeek
May 5, 2024
Maintainer

mutazzuhairi May 6, 2024
Author

zzzeek May 6, 2024
Maintainer

mutazzuhairi May 6, 2024
Author

zzzeek May 6, 2024
Maintainer

mutazzuhairi May 6, 2024
Author

zzzeek May 7, 2024
Maintainer

mutazzuhairi
May 6, 2024
Author