Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential tuning of joinedload subq thing #11226

Open
zzzeek opened this issue Apr 2, 2024 Discussed in #11224 · 0 comments
Open

potential tuning of joinedload subq thing #11226

zzzeek opened this issue Apr 2, 2024 Discussed in #11224 · 0 comments
Labels
bug Something isn't working loader options ORM options like joinedload(), load_only(), these are complicated and have a lot of issues orm
Milestone

Comments

@zzzeek
Copy link
Member

zzzeek commented Apr 2, 2024

this query has a group_by that throws off PG. so this is not the same as a LIMIT and multi-row-eager-loaders should likely be tuned.

we can make this 2.1 to reduce risk

from datetime import date, datetime
from decimal import Decimal
from typing import List

from sqlalchemy import ForeignKey, create_engine, func, select
from sqlalchemy.orm import (
   DeclarativeBase,
   backref,
   Mapped,
   mapped_column,
   relationship,
   sessionmaker,
)

class Base(DeclarativeBase):
   pass


class CategoryDB(Base):
   __tablename__ = 'category'

   category_id: Mapped[int] = mapped_column(primary_key=True)
   name: Mapped[str]


class ProductDB(Base):
   __tablename__ = 'product'

   product_id: Mapped[int] = mapped_column(primary_key=True)
   name: Mapped[str]
   price: Mapped[Decimal]
   category_id: Mapped[int] = mapped_column(
       ForeignKey('category.category_id'),
   )
   category: Mapped['CategoryDB'] = relationship(
       foreign_keys=[category_id],
       backref='ProductDB',
       cascade='all,delete',
       uselist=False,
       lazy='joined',
   )
   inventory = relationship(
       'InventoryDB',
       backref=backref('ProductDB', uselist=False),
       cascade='all,delete',
       foreign_keys=[product_id],
       primaryjoin='ProductDB.product_id == InventoryDB.product_id',
   )


class InventoryDB(Base):
   __tablename__ = 'inventory'

   inventory_id: Mapped[int] = mapped_column(primary_key=True)
   product_id: Mapped[int] = mapped_column(ForeignKey('product.product_id'))
   quantity: Mapped[int]


sqlalchemy_database_url = 'postgresql://scott:tiger@localhost/test'

engine = create_engine(
    sqlalchemy_database_url,echo=True
)
db = sessionmaker(
       bind=engine,
       expire_on_commit=False,
   )

Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)

if __name__ == "__main__":

   quantity_query = (
       select(
           ProductDB,
           ProductDB.name,
           ProductDB.price,
           func.coalesce(func.sum(InventoryDB.quantity), 0).label(
               'quantity',
           ),
       )
       .outerjoin(
           CategoryDB,
           CategoryDB.category_id == ProductDB.category_id,
       )
       .outerjoin(
           InventoryDB,
           ProductDB.product_id == InventoryDB.product_id,
       )
       .group_by(ProductDB.product_id, CategoryDB.category_id)
   )
   with db() as session:
       products_db = session.scalars(quantity_query)

without patch:

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError) column "category_1.category_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...coalesce(sum(inventory.quantity), 0) AS quantity, category_1...
                                                             ^

[SQL: SELECT product.product_id, product.name, product.price, product.category_id, product.name AS name__1, product.price AS price__1, coalesce(sum(inventory.quantity), %(coalesce_1)s) AS quantity, category_1.category_id AS category_id_1, category_1.name AS name_1 
FROM product LEFT OUTER JOIN category ON category.category_id = product.category_id LEFT OUTER JOIN inventory ON product.product_id = inventory.product_id LEFT OUTER JOIN category AS category_1 ON category_1.category_id = product.category_id GROUP BY product.product_id, category.category_id]
[parameters: {'coalesce_1': 0}]

with this patch:

diff --git a/lib/sqlalchemy/orm/context.py b/lib/sqlalchemy/orm/context.py
index fcd01e659..2839d158c 100644
--- a/lib/sqlalchemy/orm/context.py
+++ b/lib/sqlalchemy/orm/context.py
@@ -1305,11 +1305,7 @@ class ORMSelectCompileState(ORMCompileState, SelectState):
         if self.order_by is False:
             self.order_by = None
 
-        if (
-            self.multi_row_eager_loaders
-            and self.eager_adding_joins
-            and self._should_nest_selectable
-        ):
+        if self._should_nest_selectable:
             self.statement = self._compound_eager_statement()
         else:
             self.statement = self._simple_statement()
@@ -2336,9 +2332,19 @@ class ORMSelectCompileState(ORMCompileState, SelectState):
     @property
     def _should_nest_selectable(self):
         kwargs = self._select_args
+
+        if not self.eager_adding_joins:
+            return False
+
         return (
-            kwargs.get("limit_clause") is not None
-            or kwargs.get("offset_clause") is not None
+            (
+                kwargs.get("limit_clause") is not None
+                and self.multi_row_eager_loaders
+            )
+            or (
+                kwargs.get("offset_clause") is not None
+                and self.multi_row_eager_loaders
+            )
             or kwargs.get("distinct", False)
             or kwargs.get("distinct_on", ())
             or kwargs.get("group_by", False)

then we get a working query:

SELECT anon_1.product_id, anon_1.name, anon_1.price, anon_1.category_id, anon_1.name AS name_1, anon_1.price AS price_1, anon_1.quantity, category_1.category_id AS category_id_1, category_1.name AS name_2 
FROM (SELECT product.product_id AS product_id, product.name AS name, product.price AS price, product.category_id AS category_id, product.name AS name__1, product.price AS price__1, coalesce(sum(inventory.quantity), %(coalesce_1)s) AS quantity 
FROM product LEFT OUTER JOIN category ON category.category_id = product.category_id LEFT OUTER JOIN inventory ON product.product_id = inventory.product_id GROUP BY product.product_id, category.category_id) AS anon_1 LEFT OUTER JOIN category ON category.category_id = anon_1.category_id LEFT OUTER JOIN inventory ON anon_1.product_id = inventory.product_id LEFT OUTER JOIN category AS category_1 ON category_1.category_id = anon_1.category_id

@zzzeek zzzeek added bug Something isn't working orm loader options ORM options like joinedload(), load_only(), these are complicated and have a lot of issues labels Apr 2, 2024
@zzzeek zzzeek added this to the 2.1 milestone Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working loader options ORM options like joinedload(), load_only(), these are complicated and have a lot of issues orm
Projects
None yet
Development

No branches or pull requests

1 participant