Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting DISTINCT SQL syntax is on the way. #1486

Closed
7 tasks done
tristaZero opened this issue Nov 16, 2018 · 1 comment
Closed
7 tasks done

Supporting DISTINCT SQL syntax is on the way. #1486

tristaZero opened this issue Nov 16, 2018 · 1 comment
Assignees
Milestone

Comments

@tristaZero
Copy link
Contributor

tristaZero commented Nov 16, 2018

Supporting DISTINCT SQL syntax is on the way.

We plan to support DISTINCT SQL syntax. The following is our thought about DISTINCT, welcome to discuss this solution.

SQL Usage
None GROUP BY GROUP BY
Simple SELECT SELECT DISTINCT price FROM item; SELECT DISTINCT price FROM item GROUP BY price;
COUNT() SELECT COUNT(DISTINCT price) FROM item; SELECT COUNT(DISTINCT price) FROM item GROUP BY price;
SUM() SELECT SUM(DISTINCT price) FROM item; SELECT SUM(DISTINCT price) FROM item GROUP BY price;
AVG() SELECT AVG(DISTINCT price) FROM item; SELECT AVG(DISTINCT price) FROM item GROUP BY price;
MIN() SELECT MIN(DISTINCT price) FROM item; SELECT MIN(DISTINCT price) FROM item GROUP BY price;
MAX() SELECT MAX(DISTINCT price) FROM item; SELECT MAX(DISTINCT price) FROM item GROUP BY price;
Select Function SELECT DISTINCT price1 + price2 FROM item;
DISTINCT() SELECT DISTINCT(price) FROM item;
Processing

The basic processing procedure is as follow:

SQL Parse --> SQL Route --> SQL Rewrite --> SQL Execute --> SQL Merge

Considering those DISTINCT SQL above, we can group those SQL to handle.

Simple SELECT
  1. SQL Parse

    It is necessary to analyze the DISTINCT token.

  2. SQL Route

    To make sure where SQL should be sent to execute.

  3. SQL Rewrite

    Rewrite logic SQL to actual SQL. Let us suppose we rewrite logic SQL to 2 actual SQLs:

    SELECT DISTINCT price FROM item_0;

    SELECT DISTINCT price FROM item_1;

  4. SQL Execute

    Send actual SQLs to some of the shardings to execute. Let us suppose that we get two resultsets, r1 and r2 from two shardings.

  5. SQL Merge

    To get distinct results from r1 and r2 by DISTINCT(r1, r2).

Therefore, we can get fianl distinct results from all of the shardings.

COUNT() & SUM()

The processing of COUNT() and SUM() is same, so we take COUNT() for example.

  1. SQL Parse

    It is necessary to analyze the DISTINCT token.

  2. SQL Route

    To make sure where SQL should be sent to execute.

  3. SQL Rewrite

    Rewrite logic SQL to actual SQL. Let us suppose we rewrite logic SQL to 2 actual SQLs:

    SELECT DISTINCT price FROM item_0;

    SELECT DISTINCT price FROM item_1;

  4. SQL Execute

    Send actual SQLs to some of the shardings to execute. Let us suppose that we get two resultsets, r1 and r2 from two shardings.

  5. SQL Merge

    To get distinct results r3 from r1 and r2 by DISTINCT(r1, r2), then it is possiable to get final correct COUNT(DISTINCT price) by counting r3.

You can find instead of getting COUNT(DISTINCT price) from shardings and counting those resultsets, we get DISTINCT price from each sharding and distinguish those results again in our procedure. At last, we calculate the COUNT() of TWICE-DISTINCT results. The processing of SUM() is same with this.

AVG

Handling AVG is more complex. We need to get SUM(DISTINCT price) and COUNT(DISTINCT price) by using the processing above. After that, we can get AVG(DISTINCT price) by calculating SUM(DISTINCT price)/COUNT(DISTINCT price).

MIN() & MAX()

It is easier to process two of those functions. Here, we take MIN() for example.

  1. SQL Parse

    It is necessary to analyze the DISTINCT token.

  2. SQL Route

    To make sure where SQL should be sent to execute.

  3. SQL Rewrite

    Rewrite logic SQL to actual SQL. Let us suppose we rewrite logic SQL to 2 actual SQLs:

    SELECT MIN(DISTINCT price) FROM item_0;

    SELECT MIN(DISTINCT price) FROM item_1;

  4. SQL Execute

    Send actual SQLs to some of the shardings to execute. Let us suppose that we get two resultsets, r1 and r2 from two shardings.

  5. SQL Merge

    To get final MIN(DISTINCT price) result from r1 and r2 by MIN(r1, r2).

DISTINCT + GROUP BY

Welcome to join us to discuss this processing.

Base on the result of the above analysis, we can get the following tasks:

  • Parsing the SQL syntax.
  • Rewriting the logic SQL containing DISTINCT to the relative actual SQLs correctly.
  • Executing those actual SQLs
  • Merging those resultSets by using corresponding algorithms.
  • Support simple DISTINCT syntax
  • Support DISTINCT + Function
  • Support Aggretation + DISTINCT
@tristaZero
Copy link
Contributor Author

tristaZero commented Nov 28, 2018

Basic DISTINCT syntaxes have been supported now, here is the detail of Support SQL. Any suggestion, please contact us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

2 participants