hi,I have some questions about subqueries,Thanks #94

Closed
littlewish opened this Issue Oct 30, 2014 · 2 comments

Projects

None yet

2 participants

@littlewish

The subqueries is

"select(c for c in Changes if (c.id,c.tag) in select( (d.id,max(d.tag)) for d in Changes if d.tag>=tag))"

It means to find the row which has the biggest tag with "group by Changes.id" .

I run the program,PonyOrm convert it to SQL like this:

SELECT `c`.`a_id`, `c`.`id`, `c`.`relation_path`, `c`.`time`, `c`.`tag`, `c`.`type`
FROM `changes` `c`
WHERE (`c`.`id`, `c`.`tag`) IN (
    SELECT `d`.`id`, MAX(`d`.`tag`)
    FROM `changes` `d`
    WHERE `d`.`tag` >= %s
      AND MAX(`d`.`tag`) IS NOT NULL
    GROUP BY `d`.`id`
    )

AND MAX(d.tag) IS NOT NULL This part makes mysql error "Invalid use of group function"

Why does the SQL contain this part?

Thanks for anwser

@kozlovsky kozlovsky closed this in 0bae832 Oct 30, 2014
@kozlovsky
Contributor

Thanks for your report!

The error was caused by the bug and I have fixed it. You can get new code from GitHub and check if it works for you.

Why does the SQL contain this part?

This part is necessary for subqueries with NOT IN condition. Consider this query:

SELECT t1.a FROM t1 WHERE t1.b NOT IN (SELECT t2.c FROM t2)

If t2.c contains some NULL values, then the result of the outer query will be empty. This is well-known problem with NOT IN-type subqueries which can return NULL values: http://stackoverflow.com/questions/129077/not-in-clause-and-null-values

Because of this, Pony adds additional check to the inner query:

SELECT t1.a FROM t1 WHERE t1.b NOT IN (SELECT t2.c FROM t2 WHERE t2.c IS NOT NULL)

This way the query works as expected. But It turns out that Pony had two bugs here:

  1. If the inner query has GROUP BY section, then this additional check should be added to HAVING section of the subquery instead of WHERE section.
  2. This NOT NULL check should be added to NOT IN-type subqueries only. Your subquery is not NOT IN type, so the additional NOT NULL check is not necessary.

Now both of this bugs are fixed, and your query should work. But I'm not sure that the query does what you expect. Because of this I want to clarify how aggregation works in this case. Consider this query which uses our standard example database with students and courses:

>>> from pony.orm.examples.presentation import *
>>> select((s.group, max(s.gpa)) for s in Student)[:]
SELECT "s"."group", MAX("s"."gpa")
FROM "Student" "s"
GROUP BY "s"."group"

As you can see, this query adds "s"."group" column to the GROUP BY section, because it is in the SELECT part of the query and is not inside any aggregate function. This query makes sense - it will calculate maximum gpa value for each group of students. But the following query doesn't make any sense at all:

>>> select((s.id, max(s.gpa)) for s in Student)[:]
SELECT "s"."id", MAX("s"."gpa")
FROM "Student" "s"
GROUP BY "s"."id"

In this query, we use "s"."id" for grouping. But each row of the table has unique id value! Because of this, the grouping doesn't work - we will get the same result as if we just run the query:

select((s.id, s.gpa) for s in Student)[:]

So, when we perform grouping, we need to group rows by the column which has duplicate values, like "s"."group" had.

From my point of view, the id attribute looks like the primary key of the Changes table. If this is the case, then your subquery select((d.id, max(d.tag)) for d in Changes) doesn't make sense, because it calculates max value for each id separately, and this is the same as just returning d.tag value directly, without aggregation.

I don't know what is the correct query in your case, because I don't quite understand the intention of your query. But the next two queries may give you some hints:

  1. A student with the maximum gpa value (the best student of the university):

    select(s for s in Student if s.gpa == max(s2.gpa for s2 in Student))[:]
  2. A student with the maximum gpa value within its group (the best student of the group):

    select(s for s in Student if s.gpa == max(s2.gpa for s2 in s.group.students))[:]

    or

    select(s for s in Student if s.gpa == max(s2.gpa for s2 in Student if s.group == s2.group))[:]

Hope this helps. Thanks again for the bug reporting.

@littlewish

👍
Thank you for the anwser.
Changes.id has duplicate values and it is not the primary key.
The query is used to get max tag in each id.

And more,Does the Pony Orm have the OO example like database DBO?

Thanks again for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment