Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sql): constant in explicit group by should not remove parallel optimisation #4384

Merged
merged 5 commits into from Apr 10, 2024

Conversation

nwoolmer
Copy link
Contributor

@nwoolmer nwoolmer commented Apr 6, 2024

Closes #4380

This is related to Clickbench Q34. Adding an explicit group by that includes an effectively constant expression would prevent generation of Async Group By node.

Change also sorted the ordering of the SqlOptimiser tests. This makes locating the relevant test hard. It is here:

public void testConstantInGroupByDoesNotPreventOptimisation() throws Exception {
assertMemoryLeak(() -> {
ddl("create table hits (\n" +
" URL string, ts timestamp\n" +
") timestamp(ts) partition by day wal");
insert("insert into hits (URL, ts) values ('abc', 0), ('abc', 1), ('def', 2), ('ghi', 3)");
drainWalQueue();
String q1 = "SELECT 1, URL, COUNT(*) AS c FROM hits ORDER BY c DESC LIMIT 10;";
String q2 = "SELECT 1, URL, COUNT(*) AS c FROM hits GROUP BY 1, URL ORDER BY c DESC LIMIT 10;";
String q3 = "SELECT 1, URL, COUNT(*) AS c FROM hits GROUP BY URL, 1 ORDER BY c DESC LIMIT 10;";
String q4 = "SELECT 1, URL, COUNT(*) AS c FROM hits GROUP BY 1, 2 ORDER BY c DESC LIMIT 10;";
String expectedSql = "1\tURL\tc\n" +
"1\tabc\t2\n" +
"1\tghi\t1\n" +
"1\tdef\t1\n";
String expectedPlan = "Sort light lo: 10\n" +
" keys: [c desc]\n" +
" VirtualRecord\n" +
" functions: [1,URL,c]\n" +
" Async Group By workers: 1\n" +
" keys: [URL]\n" +
" values: [count(*)]\n" +
" filter: null\n" +
" DataFrame\n" +
" Row forward scan\n" +
" Frame forward scan on: hits\n";
assertSql(expectedSql, q1);
assertSql(expectedSql, q2);
assertSql(expectedSql, q3);
assertSql(expectedSql, q4);
assertPlan(q1, expectedPlan);
assertPlan(q2, expectedPlan);
assertPlan(q3, expectedPlan);
assertPlan(q4, expectedPlan);
});
}

Control query:

SELECT 1, URL, COUNT(*) AS c FROM hits ORDER BY c DESC LIMIT 10;

Gives:

Sort light lo: 10
  keys: [c desc]
    VirtualRecord
      functions: [1,URL,c]
        Async Group By workers: 16
          keys: [URL]
          values: [count(*)]
          filter: null
            DataFrame
                Row forward scan
                Frame forward scan on: hits

Problematic query:

SELECT 1, URL, COUNT(*) AS c FROM hits GROUP BY 1, URL ORDER BY c DESC LIMIT 10;

Plan before:

Sort light lo: 10
  keys: [c desc]
    VirtualRecord
      functions: [1,URL,c]
        GroupBy vectorized: false
          keys: [URL,1]
          values: [count(*)]
            VirtualRecord
              functions: [URL,1]
                DataFrame
                    Row forward scan
                    Frame forward scan on: hits

Plan after:

Sort light lo: 10
  keys: [c desc]
    VirtualRecord
      functions: [1,URL,c]
        Async Group By workers: 16
          keys: [URL]
          values: [count(*)]
          filter: null
            DataFrame
                Row forward scan
                Frame forward scan on: hits

@nwoolmer nwoolmer added Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution Performance Performance improvements labels Apr 6, 2024
@nwoolmer nwoolmer marked this pull request as ready for review April 6, 2024 12:08
@nwoolmer nwoolmer requested a review from puzpuzpuz April 6, 2024 12:08
@bluestreak01
Copy link
Member

hey @nwoolmer could yuou do me a favoir and submit the tests avoiding code reformatting. I cannot read the diff unfortuantely.

@ideoma
Copy link
Collaborator

ideoma commented Apr 9, 2024

[PR Coverage check]

😍 pass : 1 / 1 (100.00%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/SqlOptimiser.java 1 1 100.00%

@bluestreak01 bluestreak01 merged commit 6e02f89 into master Apr 10, 2024
24 checks passed
@bluestreak01 bluestreak01 deleted the nw_constant_in_group_by branch April 10, 2024 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Incorrect or unexpected behavior Performance Performance improvements SQL Issues or changes relating to SQL execution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GROUP BY clause may lead to worse performance
4 participants