forked from apache/druid
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support to attempt converting to top n query when two group by di…
…mensions present Summary: Add support to attempt converting to top n query when two group by dimensions present Motivation: below example query above can be executed as a Top N native query with granularity field set to `hour` but currently can only be translated to a GROUP BY native query in broker. ``` SELECT SUM(ct) FILTER(WHERE eventtype = 'PIN_IMPRESSION') AS IMPRESSION, SUM(ct) FILTER(WHERE eventtype = 'PIN_CLOSEUP') AS CLOSEUP, SUM(ct) FILTER(WHERE eventtype = 'PIN_CLICKTHROUGH') AS CLICKTHROUGH, SUM(ct) FILTER(WHERE eventtype = 'PIN_REPIN') AS SAVE, SUM(ct) FILTER(WHERE eventtype = 'VIDEO_MRC_VIEWS') AS VIDEO_MRC_VIEW, SUM(ct) FILTER(WHERE eventtype = 'QUARTILE_95_PERCENT') AS QUARTILE_95_PERCENT_VIEW, SUM(ct) FILTER(WHERE eventtype = 'VIDEO_V50_WATCH_TIME_MS') AS VIDEO_V50_WATCH_TIME, SUM(ct) FILTER(WHERE eventtype = 'VIDEO_START') AS VIDEO_START, MIN(CAST(create_timestamp AS BIGINT)) AS min_create_timestamp, root_pin_id AS root_pin_id, FLOOR(__time TO HOUR) AS __time FROM pin_stats_realtime_root WHERE (__time >= TIMESTAMP '2020-11-18 03:51:59.000000' AND __time <= TIMESTAMP '2020-11-19 03:51:59.000000' AND (app IN ('1', '2', '3', '4', '5', '6') AND root_pin_id = 512284526356261619 AND version_number = 7 AND partner_id = 512284663771961963)) GROUP BY root_pin_id, FLOOR(__time TO HOUR) LIMIT 100 ``` The change affects SQL to native query translation on broker: When `attemptConvertingToTopNWithTwoGroupByDimensions` is set to true in query context: Apart from existing criteria of whether a SQL can be translated to a top N query, a SQL is also convertible to TOP N query when GROUP BY two columns with one of them being a granular time. Meanwhile, granularity can be inferred from the GROUP BY column of granular time. Caveats when use this mechanism: When execute as a GROUP BY query, the limit is appied globally, so there will be at most `limit` number of rows returned When execute as a TOP N query, the limit is applied per group within each granular time bucket, so there will be at most (`limit` * number of distinct groups within each granular time bucket) number of rows returned When limit is large enough, the result is the same except for potential ordering difference Test Plan: Unit test and integration tests in dev cluster. Reviewers: O1139 Druid, itallam, yyang Reviewed By: O1139 Druid, itallam, yyang Subscribers: jenkins, #realtime-analytics Differential Revision: https://phabricator.pinadmin.com/D650035
- Loading branch information
Jian Wang
committed
Nov 20, 2020
1 parent
135fcc9
commit 3f2604d
Showing
5 changed files
with
209 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters