how to work around SQL server disallowing SQL expressions and even label references inside of group_by #4540

guruofgentoo · 2019-03-13T17:12:50Z

Using pyodbc for MS SQL Server connections generally works. However, I have found a case where the way SQLAlchemy passes the query to pyodbc breaks for some group_by usage.

pyodbc operates by taking the sql statement and any parameters, and constructs a prepared statement with those parameters. Any literals in the query, including in group_by, are taken as individual parameters, so the same literal given in the select and group_by sections of a query become separate parameters. SQL Server's group by validation is very strict, and in this case, it sees those parameters as different, and fails the query.

This issue was raised directly on pyodbc last year and closed: mkleehammer/pyodbc#479

Sample setup and query:

import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

Base = declarative_base()


class Widget(Base):
    __tablename__ = 'widgets'

    id = sa.Column(sa.Integer, primary_key=True)
    part_number = sa.Column(sa.String(250), nullable=False)


db_url = 'mssql+pyodbc://sa:password@localhost/pyodbc_test?driver=SQL+Server+Native+Client+11.0'
# db_url = 'postgresql://postgres:postgres@localhost:5432/pyodbc_test'
engine = sa.create_engine(db_url)


Base.metadata.create_all(engine)
Base.metadata.bind = engine

DBSession = sessionmaker(bind=engine)
session = DBSession()

session.execute('delete from widgets')
session.add(Widget(part_number='000-1111'))
session.add(Widget(part_number='111-1111'))
session.add(Widget(part_number='000-2222'))
session.commit()

modified_part_number = sa.sql.literal('05') + sa.sql.func.right(Widget.part_number, 5)
query = session.query(
    modified_part_number.label('modified_part_number'),
    sa.sql.func.count('id').label('part_count'),
).group_by(
    modified_part_number
).order_by(
    modified_part_number
)

results = query.all()
assert results[0].modified_part_number == '05-1111'
assert results[0].part_count == 2
assert results[1].modified_part_number == '05-2222'
assert results[1].part_count == 1

The above query yields the following exception:

Traceback (most recent call last):
  File "C:\Users\Matt\.virtualenvs\pyodbc-test-ELAXx7hS\lib\site-packages\sqlalchemy\engine\base.py", line 1244, in _execute_context
    cursor, statement, parameters, context
  File "C:\Users\Matt\.virtualenvs\pyodbc-test-ELAXx7hS\lib\site-packages\sqlalchemy\engine\default.py", line 552, in do_execute
    cursor.execute(statement, parameters)
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Column 'widgets.part_number' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. (8120) (SQLExecDirectW); [42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Statement(s) could not be prepared. (8180)")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "pyodbc_test.py", line 42, in <module>
    results = query.all()
  File "C:\Users\Matt\.virtualenvs\pyodbc-test-ELAXx7hS\lib\site-packages\sqlalchemy\orm\query.py", line 3161, in all
    return list(self)
  File "C:\Users\Matt\.virtualenvs\pyodbc-test-ELAXx7hS\lib\site-packages\sqlalchemy\orm\query.py", line 3317, in __iter__
    return self._execute_and_instances(context)
  File "C:\Users\Matt\.virtualenvs\pyodbc-test-ELAXx7hS\lib\site-packages\sqlalchemy\orm\query.py", line 3342, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "C:\Users\Matt\.virtualenvs\pyodbc-test-ELAXx7hS\lib\site-packages\sqlalchemy\engine\base.py", line 988, in execute
    return meth(self, multiparams, params)
  File "C:\Users\Matt\.virtualenvs\pyodbc-test-ELAXx7hS\lib\site-packages\sqlalchemy\sql\elements.py", line 287, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "C:\Users\Matt\.virtualenvs\pyodbc-test-ELAXx7hS\lib\site-packages\sqlalchemy\engine\base.py", line 1107, in _execute_clauseelement
    distilled_params,
  File "C:\Users\Matt\.virtualenvs\pyodbc-test-ELAXx7hS\lib\site-packages\sqlalchemy\engine\base.py", line 1248, in _execute_context
    e, statement, parameters, cursor, context
  File "C:\Users\Matt\.virtualenvs\pyodbc-test-ELAXx7hS\lib\site-packages\sqlalchemy\engine\base.py", line 1466, in _handle_dbapi_exception
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "C:\Users\Matt\.virtualenvs\pyodbc-test-ELAXx7hS\lib\site-packages\sqlalchemy\util\compat.py", line 383, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "C:\Users\Matt\.virtualenvs\pyodbc-test-ELAXx7hS\lib\site-packages\sqlalchemy\util\compat.py", line 128, in reraise
    raise value.with_traceback(tb)
  File "C:\Users\Matt\.virtualenvs\pyodbc-test-ELAXx7hS\lib\site-packages\sqlalchemy\engine\base.py", line 1244, in _execute_context
    cursor, statement, parameters, context
  File "C:\Users\Matt\.virtualenvs\pyodbc-test-ELAXx7hS\lib\site-packages\sqlalchemy\engine\default.py", line 552, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Column 'widgets.part_number' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. (8120) (SQLExecDirectW); [42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Statement(s) could not be prepared. (8180)")
[SQL: SELECT ? + right(widgets.part_number, ?) AS modified_part_number, count(?) AS part_count
FROM widgets GROUP BY ? + right(widgets.part_number, ?) ORDER BY ? + right(widgets.part_number, ?)]
[parameters: ('05', 5, 'id', '05', 5, '05', 5)]
(Background on this error at: http://sqlalche.me/e/f405)

pyodbc ends up with a query that looks like this, which fails. Note that P1 and P2 are in the select, but P4 and P5 are in the group by:

declare @p1 int
set @p1=NULL
exec sp_prepexec @p1 output,N'@P1 nvarchar(4),@P2 bigint,@P3 nvarchar(4),@P4 nvarchar(4),@P5 bigint,@P6 nvarchar(4),@P7 bigint',N'SELECT @P1 + right(widgets.part_number, @P2) AS modified_part_number, count(@P3) AS part_count 
FROM widgets GROUP BY @P4 + right(widgets.part_number, @P5) ORDER BY @P6 + right(widgets.part_number, @P7)',N'05',5,N'id',N'05',5,N'05',5
select @p1

The text was updated successfully, but these errors were encountered:

zzzeek · 2019-03-13T19:57:32Z

OK, unfortunately, I was assuming SQL server would accept this query, since this is what we can do:


SELECT ? + right(widgets.part_number, ?) AS modified_part_number, count(?) AS part_count 
FROM widgets GROUP BY modified_part_number ORDER BY modified_part_number

but it doesn't. The ORDER BY can be rendered as above like this:

modified_part_number = (
    sa.sql.literal("05") + sa.sql.func.right(Widget.part_number, 5)
).label("modified_part_number")
query = (
    session.query(
        modified_part_number,
        sa.sql.func.count("id").label("part_count"),
    )
    .group_by(modified_part_number)
    .order_by(modified_part_number)
)

But we still break out the modified_part_number for GROUP BY, because SQL Server doesn't accept the label name there.

sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42S22', "[42S22] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Invalid column name 'modified_part_number'. (207) (SQLExecDirectW)")
[SQL: SELECT ? + right(widgets.part_number, ?) AS modified_part_number, count(?) AS part_count 
FROM widgets GROUP BY modified_part_number ORDER BY modified_part_number]

so you answer is in mkleehammer/pyodbc#479, use a subquery:

modified_part_number = (
    sa.sql.literal("05") + sa.sql.func.right(Widget.part_number, 5)
).label("modified_part_number")

subq = session.query(modified_part_number, Widget.id).subquery()

query = (
    session.query(
        subq.c.modified_part_number,
        sa.sql.func.count(subq.c.id).label("part_count"),
    )
    .group_by(subq.c.modified_part_number)
    .order_by(subq.c.modified_part_number)
)

that's what SQL Server requires so that's what you have to do. Sorry!

guruofgentoo · 2019-03-15T14:34:30Z

I see, I was hoping that at some point in the stack, the parameters could be named or arranged to mitigate this. But I've gotten some education on how pyodbc works through this: a) ODBC doesn't do named parameters, and b) the way pyodbc processes the query is indeed very direct. The SQL gets passed more or less directly to the ODBC layer as the prepared statement, so there is no place to name the params to the prepared statement.

Which is unfortunate, though all of this would be solved if SQL Server supported labels in group by...

Technically, the only way around it in pyodbc that I see is to use fully rendered SQL statements instead of parameterized versions, since pyodbc doesn't prepare those. But that would take out some primary means of preventing SQL injection, so it's a non-starter.

zzzeek · 2019-03-15T16:10:17Z

is the issue here the use of parameters? the error message suggests it's the structure of the statement and where the columns are placed.

guruofgentoo · 2019-03-15T16:58:57Z

It is the parameters, because if SQLAlchemy fully rendered the query and sent it into pyodbc as a string with no parameters (since pyodbc only prepares statements it receives with parameters), it would be something like:

select
    '05' + right(widgets.part_number, 5) as modified_part_number,
    count(id) as part_count
from widgets
group by '05' + right(widgets.part_number, 5)
order by '05' + right(widgets.part_number, 5)

Which works with SQL Server, because the '05' remains a literal in the statement and is not being interpolated from various copies in the parameters list.

But, I did think of an alternative workaround that yields the same result and appears to work: modified_part_number = sa.sql.text("'05'") + sa.sql.func.right(Widget.part_number, sa.sql.text('5')). It's hacky, but takes the parameters out of the equation by effectively dropping down to the rendering level.

zzzeek · 2019-03-16T14:39:47Z

anyway this is how pyodbc / ODBC works, I'd use the subquery approach since it is not safe to use literal parameters in a statement as well as that prepared statements are cacheable. feel free to reopen if you have more questions.

zzzeek changed the title ~~pyodbc group_by breaks with parameters~~ label_reference not included in group_by Mar 13, 2019

zzzeek added question issue where a "fix" on the SQLAlchemy side is unlikely, hence more of a usage question SQL Server Microsoft SQL Server, e.g. mssql labels Mar 13, 2019

zzzeek changed the title ~~label_reference not included in group_by~~ how to work aroudn SQL server disallowing SQL expressions and even label references inside of group_by Mar 13, 2019

guruofgentoo changed the title ~~how to work aroudn SQL server disallowing SQL expressions and even label references inside of group_by~~ how to work around SQL server disallowing SQL expressions and even label references inside of group_by Mar 15, 2019

zzzeek closed this as completed Mar 16, 2019

zzzeek removed the bug Something isn't working label Mar 16, 2019

guruofgentoo mentioned this issue Mar 28, 2019

Aggregate queries with case/when statement fails for SQL Server / sp_prepexec with differing parameters in select vs group by portion of query mkleehammer/pyodbc#479

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to work around SQL server disallowing SQL expressions and even label references inside of group_by #4540

how to work around SQL server disallowing SQL expressions and even label references inside of group_by #4540

guruofgentoo commented Mar 13, 2019

zzzeek commented Mar 13, 2019

guruofgentoo commented Mar 15, 2019 •

edited

Loading

zzzeek commented Mar 15, 2019

guruofgentoo commented Mar 15, 2019

zzzeek commented Mar 16, 2019

how to work around SQL server disallowing SQL expressions and even label references inside of group_by #4540

how to work around SQL server disallowing SQL expressions and even label references inside of group_by #4540

Comments

guruofgentoo commented Mar 13, 2019

zzzeek commented Mar 13, 2019

guruofgentoo commented Mar 15, 2019 • edited Loading

zzzeek commented Mar 15, 2019

guruofgentoo commented Mar 15, 2019

zzzeek commented Mar 16, 2019

guruofgentoo commented Mar 15, 2019 •

edited

Loading