Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warn when postgresql distinct columns used on other dialect #4002

Closed
sqlalchemy-bot opened this issue May 30, 2017 · 30 comments
Closed

warn when postgresql distinct columns used on other dialect #4002

sqlalchemy-bot opened this issue May 30, 2017 · 30 comments
Labels
bug Something isn't working postgresql sql
Milestone

Comments

@sqlalchemy-bot
Copy link
Collaborator

Migrated issue, originally created by Michael Bayer (@zzzeek)

e.g. query.distinct()

also add doc note.

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

Implement get_unique_constraints, get_check_constraints for Oracle

Pull-request: zzzeek/sqlalchemy#326
Pull-request: zzzeek/sqlalchemy#342

Fixes: #4002
Change-Id: I221fe8ba305fc455a03e3a5d15f803bf8ee2e8fb

f8b4f72

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed status to closed

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

gerrit was incorrectly targeted here, that gerrit is for #4003

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed status to reopened

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed milestone from "1.1.x" to "1.x.xx"

@sqlalchemy-bot sqlalchemy-bot added bug Something isn't working sql labels Nov 27, 2018
@sqlalchemy-bot sqlalchemy-bot added this to the 1.x.xx milestone Nov 27, 2018
@zzzeek
Copy link
Member

zzzeek commented May 18, 2019

this just came up again in #4687. it is strange that we just ignore the ON here. We should generate some kind of token and produce SQL that won't run on non-PG databases, or make this a StrSQLCompiler thing, e.g. it renders "ON ", and default compiler raises.

temporarily making this 1.4 since it is bugging me but I might blow this out again.

@zzzeek zzzeek modified the milestones: 1.x.xx, 1.4 May 18, 2019
@CaselIT
Copy link
Member

CaselIT commented Jan 25, 2020

I'll try this one.

The plan is to just warn or to produce non runnable sql?
Currently the distinct is rendered by SQLCompiler.get_select_precolumns, that is custom for each backend. It may be useful to refactor the distinct rendering to in its own method, called by get_select_precolumns (since I guess different backend may want precolumn stuff in different order).

Then if we want it to warn we would have a default implementation that warns if distinct on is used, and the postgres dialect would overrid this method.
If the plan is to just render non runnable sql then I think the default implementation should be the one of postgress, that would not work on the other backends

@zzzeek
Copy link
Member

zzzeek commented Jan 25, 2020

I'll try this one.

The plan is to just warn or to produce non runnable sql?

so this has been like this for many years, so what we do is to not disrupt whoever might be relying upon this by just emitting a warning. If it were day one then maybe it should have raised but there's no urgency to make it do that right now.

Currently the distinct is rendered by SQLCompiler.get_select_precolumns, that is custom for each backend. It may be useful to refactor the distinct rendering to in its own method, called by get_select_precolumns (since I guess different backend may want precolumn stuff in different order).

It's not that trivial to change some of the big "overridden" methods like these because we have to worry about all the external dialects that might be using this. Also, while the compiler can always choose to break a coarse grained template method into a series of smaller ones, and the downstream consuming classes can change gradually, we can't go back in the other direction, that is, if some step uses methods A,B and C, we can't change the compiler to combine steps A, B, C into a single method D without still calling A, B, C because the consuming classes still use that. as always, I'm cautious about making methods more fine grained because they are the primary source of performance overhead in cPython.

However, looking at how this works right now, it's hard to see other avenues that are not worse. It seems like there should be a "render_distinct()" method and then a "render_distinct_on()" method , each are called based on if "select._distinct" / "select._distinct_on" is not None, and the latter one raises unless its on Postgresql.

but this still does concern me. The SQL Server and Sybase dialects use get_select_precolumns also to render the "TOP" keyword, and it's not clear if other dialects have a more complex interaction going on in terms of the "get_select_precolumns()" thing. if we leave get_select_precolumns() in place and then have that method call onto "get_distinct()" / "get_distinct_on()", this retains backwards compatibility. It also adds the most method calls. but we'd have to say that these methods only occur if DISTINCT / DISTINCT ON is present which is not the majority of queries so it probably is OK.

the other way to go would be having a dialect level flag, "supports_distinct_on", and the compiler just looks at that in conjunction with "select._distinct_on" being present.

Then if we want it to warn we would have a default implementation that warns if distinct on is used, and the postgres dialect would overrid this method.
If the plan is to just render non runnable sql then I think the default implementation should be the one of postgress, that would not work on the other backends

well whichver way, the addition of the new methods is how we'd get there so we start with the most minimal approach which is the warning.

@CaselIT
Copy link
Member

CaselIT commented Jan 25, 2020

I agree regarding the method.

We should generate some kind of token and produce SQL that won't run on non-PG databases

I was suggesting it because with the current implementation, if the plan was to emit some sql that would not run in other db, it was the best option.

That being said, if the plan is to just warn or raise while compiling, then I think a dialect flag is better.
This would be checked in SQLCompiler.visit_select that should be run in most cases (it seems that sometimes mssql does its own thing)

@zzzeek
Copy link
Member

zzzeek commented Jan 25, 2020

I agree regarding the method.

We should generate some kind of token and produce SQL that won't run on non-PG databases

I was suggesting it because with the current implementation, if the plan was to emit some sql that would not run in other db, it was the best option.

Oh I think my comment up there maybe was trying to say there'd be a DistinctOn object or something like that. that might work too, but I don't care either way.

That being said, if the plan is to just warn or raise while compiling, then I think a dialect flag is better.
This would be checked in SQLCompiler.visit_select that should be run in most cases (it seems that sometimes mssql does its own thing)

what bothers me about that is that a dialect has to not only implement the rendering for "distinct on" but also has to have the flag that's just for this one little check, and also that this method is inconsistent vs. how we handle other things that aren't supported, like returning, CTEs, multiple table UPDATE, DELETE FROM, sequence increments, empty set expressions (I just went through compiler.py to find all the examples of raising for not-supported SQL elements). having a second way to do something that doesn't match how it's done for half a dozen other similar things makes new dialects more difficult to build and adds complexity.

@CaselIT
Copy link
Member

CaselIT commented Jan 25, 2020

Makes sense. How do we solve it?

@zzzeek
Copy link
Member

zzzeek commented Jan 25, 2020

I think we make get_select_precolumns like this:

def get_select_distinct(self, select, **kw):
    return "DISTINCT "  # note trailing space
def get_select_distinct_on(self, select, **kw):
    raise CompileError("...")
def get_select_precolumns(self, select, **kw):
    if select._distinct_on:
        return self.get_select_distinct_on(select, **kw):
    elif select._distinct:
        return self.get_select_distinct(select, **kw)
    else:
        return ""

then we change for example SQL Server to be like this:

    def get_select_precolumns(self, select, **kw):
        """ MS-SQL puts TOP, it's version of LIMIT here """

        text = compiler.SQLCompiler.get_select_precolumns(self, select, **kw)

        if select._simple_int_limit and (
            select._offset_clause is None
            or (select._simple_int_offset and select._offset == 0)
        ):
            # ODBC drivers and possibly others
            # don't support bind params in the SELECT clause on SQL Server.
            # so have to use literal here.
            kw["literal_execute"] = True
            text += "TOP %s " % self.process(select._limit_clause, **kw)
        return text

that is, we change all the dialects to make use of compiler's version of this method. the only dialect doing something weird here is MySQL with some ancient deprecated thing that isn't emitting a warning (that's a bug).

so what is not working here, but I'm willing to shrug it off, is if a downstream dialect has a get_select_precolumns() method, and they don't change theirs, they will not get the warning here until they change to use the method in the slightly newer way. I'm kind of OK with that.

@CaselIT
Copy link
Member

CaselIT commented Jan 25, 2020

ok. why the two new methods (get_select_distinct and get_select_distinct_on) though?
If we make the dialects call SQLCompiler.get_select_precolumns the methods would not be needed.
Postgres would not do the super call, making it different from the other, but we avoid two new methods.

@zzzeek
Copy link
Member

zzzeek commented Jan 26, 2020

OK.... so get_select_precolumns raises if distinct_on is present ? ok i think part of it was it wasn't clear what "distinct" could mean and you'll notice MySQL dialect has an alternative interpretation of it, which is ancient and we should take out, but it would have to emit a deprecation warning first

@CaselIT
Copy link
Member

CaselIT commented Jan 26, 2020

Ok. I'll raise a warning there, and keep the function for now.

It should not be hard to do

        if isinstance(select._distinct, util.string_types):
            # warn here
            return select._distinct.upper() + " "
        return SQLCompiler.get_select_precolumns(...)

Also I've noticed that you use SQLCompiler.get_select_precolumns(self, ...) in place of super. Is there a particular reason?

@zzzeek
Copy link
Member

zzzeek commented Jan 26, 2020

I have no idea, that is code from 2007 at the latest. super() probably didn't work that intuitively in python 2.3 or something.

@CaselIT
Copy link
Member

CaselIT commented Jan 26, 2020

ok. I'll keep it. if ain't broke etc

@zzzeek
Copy link
Member

zzzeek commented Jan 26, 2020

I would change it probably because it's non-idiomatic as it is :)

@CaselIT
Copy link
Member

CaselIT commented Jan 26, 2020

maybe a general style update would be better done in its own issue though

@zzzeek
Copy link
Member

zzzeek commented Jan 26, 2020

you mean like all the places where super() should be used but isnt? OK. with "major version" changes like these, as opposed to point-release single issue fixes, I tend to take more liberties with getting things cleaned up as I go but it is not critical either way.

@CaselIT
Copy link
Member

CaselIT commented Jan 26, 2020

I meant in general. Expecially since v2 is py3 only. A lot of code could be cleaned up. another example is class foo(object). But it's not exactly critical by any definition

@zzzeek
Copy link
Member

zzzeek commented Jan 26, 2020

oh yes the "class Foo(object):" thing will be changed using automated tools for the py3-only conversion. that's a consistent pattern.

this thing though is not really consistent, and there are some places where super() is not used because I needed to control exactly what superclass is being hit, so that will not be as easy to get using an automated process. I think when we find something like this FooClass.do_thing(self, ..) call that isn't idiomatic anywhere and we are changing the function anyway, we just fix that part for now.

@sqla-tester
Copy link
Collaborator

Federico Caselli has proposed a fix for this issue in the master branch:

Disallow DISTINCT ON when not targeting PostgreSQL https://gerrit.sqlalchemy.org/1687

CaselIT added a commit to CaselIT/sqlalchemy that referenced this issue Feb 4, 2020
…h-4002

Change-Id: I5702fc90e901c6b5fd8fd9893fac44d169e52f74
CaselIT added a commit to CaselIT/sqlalchemy that referenced this issue Mar 10, 2020
…h-4002

Change-Id: Icb028babf2c747660b379bd96bbc43817e205054
@CaselIT
Copy link
Member

CaselIT commented Mar 10, 2020

I forgot that I still had to change the implementation from raise to warn.
Should be ready for review

openstack-gerrit pushed a commit to openstack/openstack that referenced this issue Mar 25, 2020
* Update nova from branch 'master'
  - Merge "remove DISTINCT ON SQL instruction that does nothing on MySQL"
  - remove DISTINCT ON SQL instruction that does nothing on MySQL
    
    The SQLAlchemy ORM call ``.distinct('host')`` indicates that
    an expression such as "DISTINCT ON host" should be rendered.
    However, this syntax is only available on PostgreSQL.   When run
    on any other backend, the expression delivers SQL "DISTINCT"
    and the additional expressions are ignored.
    
    An upcoming version of SQLAlchemy will begin to warn when
    "DISTINCT ON" is called for on a backend that does not support it,
    as the semantics of "DISTINCT ON <col>" are quite different from
    "DISTINCT".   As Openstack targets MySQL primarily with SQLite
    used for unit tests, it should already be guaranteed that
    this query only needs to be rendering "DISTINCT" in order to pass
    current tests and use cases, since that's all that's being rendered.
    
    Change-Id: I267c6f772d514b442c8c3356c2babc1fe98a8b97
    References: sqlalchemy/sqlalchemy#4002
openstack-gerrit pushed a commit to openstack/nova that referenced this issue Mar 25, 2020
The SQLAlchemy ORM call ``.distinct('host')`` indicates that
an expression such as "DISTINCT ON host" should be rendered.
However, this syntax is only available on PostgreSQL.   When run
on any other backend, the expression delivers SQL "DISTINCT"
and the additional expressions are ignored.

An upcoming version of SQLAlchemy will begin to warn when
"DISTINCT ON" is called for on a backend that does not support it,
as the semantics of "DISTINCT ON <col>" are quite different from
"DISTINCT".   As Openstack targets MySQL primarily with SQLite
used for unit tests, it should already be guaranteed that
this query only needs to be rendering "DISTINCT" in order to pass
current tests and use cases, since that's all that's being rendered.

Change-Id: I267c6f772d514b442c8c3356c2babc1fe98a8b97
References: sqlalchemy/sqlalchemy#4002
CaselIT added a commit to CaselIT/sqlalchemy that referenced this issue Mar 26, 2020
…h-4002

Change-Id: I1093cd696f2d94713a12d828749e16ae5ba61d84
CaselIT added a commit to CaselIT/sqlalchemy that referenced this issue Mar 31, 2020
…h-4002

Change-Id: Ie32bbd14edc60637b9d221aa978b99103c2fe7cd
CaselIT added a commit to CaselIT/sqlalchemy that referenced this issue Apr 7, 2020
…h-4002

Change-Id: I89608ab4eaeaff122576ca74f952efceb7fa7e30
CaselIT added a commit to CaselIT/sqlalchemy that referenced this issue Apr 10, 2020
…h-4002

Change-Id: I300de0d48620f47862fe61be964ad25c7d51d5ce
CaselIT added a commit to CaselIT/sqlalchemy that referenced this issue Apr 16, 2020
…h-4002

Change-Id: I52dd35818da96cfac1561209ded1b1970e03e110
@hkashi-aval
Copy link

hkashi-aval commented Jan 19, 2021

it's not fixed on subquery. when we are calling a query with .subquery() there is no Distinct On but only Distinct

@CaselIT
Copy link
Member

CaselIT commented Jan 19, 2021

Could you provide an example? I'm not sure I understand the problem

@mroleh
Copy link

mroleh commented Nov 28, 2021

Hello @CaselIT , I also faced with this issue.
For example when I run the query without subquer() I have correct result

SELECT DISTINCT ON (invoice_id, invoice_contract_id) invoice_id, invoice_contract_id, invoice_create_date, invoice_status, invoice_update_date 
FROM invoice

And when I run query with .subquery() I have next result

SELECT DISTINCT invoice_id, invoice_contract_id, invoice_create_date, invoice_status, invoice_update_date 
FROM invoice

@CaselIT
Copy link
Member

CaselIT commented Nov 28, 2021

Could you open a new issue? That seems a bug that's not related to this issue that just added a warning.

Thanks

@mroleh
Copy link

mroleh commented Nov 28, 2021

Added new issue
#7378
Thanks.
@CaselIT could you please help me with this issue?
maybe you know how I can avoid this issue and get DISTINCT ON with the subquery()?

@CaselIT
Copy link
Member

CaselIT commented Nov 29, 2021

Thanks

kadler added a commit to IBM/sqlalchemy-ibmi that referenced this issue Nov 29, 2023
This function doesn't need to do anything special that the base function
doesn't already do.

The only extra thing it's doing is some special handling if
self._distinct is a string. This seems to be some ancient, deprecated
behavior in the MySQL dialect which was probably copied in unkowningly.
Because we override this function, we don't emit the warning that we are
silently ignoring DISTINCT ON, which the DistinctOnTest expects. Since
we don't have any existing applications, we don't need any compatibility
deprecation like MySQL.

See SQLAlchemy commit 07d6d211f23f1d9d1d69fd54e8054bccd515bc8c and
sqlalchemy/sqlalchemy#4002 for more details
about the upstream change.
kadler added a commit to IBM/sqlalchemy-ibmi that referenced this issue Dec 14, 2023
This function doesn't need to do anything special that the base function
doesn't already do.

The only extra thing it's doing is some special handling if
self._distinct is a string. This seems to be some ancient, deprecated
behavior in the MySQL dialect which was probably copied in unkowningly.
Because we override this function, we don't emit the warning that we are
silently ignoring DISTINCT ON, which the DistinctOnTest expects. Since
we don't have any existing applications, we don't need any compatibility
deprecation like MySQL.

See SQLAlchemy commit 07d6d211f23f1d9d1d69fd54e8054bccd515bc8c and
sqlalchemy/sqlalchemy#4002 for more details
about the upstream change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working postgresql sql
Projects
None yet
Development

No branches or pull requests

6 participants