Add support for FETCH syntax #666

kasiafi · 2019-04-25T08:48:44Z

Relates to #1

martint

This is a good start. Let's add the corresponding AST node and a semantic check that fails with "unsupported" error. Otherwise, it will parse queries successfully and ignore the clause, which amounts to producing incorrect results.

martint · 2019-04-25T14:38:55Z

presto-parser/src/main/java/io/prestosql/sql/parser/AstBuilder.java

+        if (context.FETCH() != null) {
+            limit = Optional.of(limit.orElse("1"));  // default FETCH row count is 1
+            long limitValue = Long.parseLong(limit.get());
+            check(limitValue != 0, "Invalid row count in FETCH FIRST clause", context.limit);


This is a semantic check that should be deferred until analysis.

The semantic difference between LIMIT and FETCH is whether they allow row count equal to 0.

In my approach, in the course of processing the query, FETCH and LIMIT are distinguishable only as far as AstBuilder. That is why I put the semantic check (row count > 0) there.

To avoid that, LIMIT and FETCH could be carried distinctly until analysis which would enable FETCH-specific check there, and glued together afterwards. I consider this over-complicating, as they are essentially the same.

Or I could define a new token in grammar like POSITIVE_INTEGER and let parser do the check.

I guess that works until we add percentage and WITH TIES.

Also, technically, the number can be arbitrarily large. The spec requires it to be "an exact numeric type with scale 0" (for rows -- for percent it can be any numeric type). So Long.parseLong might not be sufficient. Incidentally, specifying a larger value will fail with a NumberFormatException. Validating this in the analyzer allows us to provide better error messages and error codes.

Another issue is that parsing/formatting doesn't roundtrip. If you have a query with "FETCH FIRST x ROWS" it will be printed as "LIMIT x" by the SqlFormatter.

OK, I see that there is more to analyse than just row count greater than zero or not. So the right approach is to keep LIMIT and FETCH distinct until analysis.
I think I'll keep them separate until QueryPlanner and there produce a LimitNode for a query with fetch.

kasiafi · 2019-04-25T17:59:02Z

This is a good start. Let's add the corresponding AST node and a semantic check that fails with "unsupported" error. Otherwise, it will parse queries successfully and ignore the clause, which amounts to producing incorrect results.

There are no incorrect results, as I'm using the existing support for LIMIT so that it takes care of FETCH too.
This is why I use the same limit token in grammar to capture the row count.

martint · 2019-04-25T18:13:09Z

There are no incorrect results, as I'm using the existing support for LIMIT so that it takes care of FETCH too.

Ah, that's right. I overlooked the assigment to the limit variable

kasiafi · 2019-04-28T15:09:44Z

Applied distinct FETCH handling up to LogicalPlanner (Implementation for LIMIT is used from there on)
plus tests
plus SqlFormatter change

martint · 2019-04-29T22:48:40Z

presto-main/src/main/java/io/prestosql/sql/analyzer/StatementAnalyzer.java

+                throw new SemanticException(INVALID_FETCH_ROW_COUNT, node, "%s", fetch);
+            }
+            if (fetchValue <= 0) {
+                throw new SemanticException(INVALID_FETCH_ROW_COUNT, node, "FETCH row count must be positive (actual value: %s)", fetchValue);


The feature is officially called "FETCH FIRST" in the specification, so let's refer to it that way in error messages, etc.

a side note: user's query can use FETCH NEXT syntax. Does mentioning FETCH FIRST in error message be helpful?
Maybe FETCH FIRST/NEXT?

I think @martint 's comment was about complying with the official naming and not about trying to recall in the error message how the actual query looked like.
Spec mentions "fetch first clause", "fetch first quantity", so probably the full name is "fetch first".

martint · 2019-04-29T22:57:30Z

presto-main/src/main/java/io/prestosql/sql/planner/QueryPlanner.java

    }

-    private PlanBuilder limit(PlanBuilder subPlan, Optional<OrderBy> orderBy, Optional<String> limit)
+    private PlanBuilder limit(PlanBuilder subPlan, Optional<OrderBy> orderBy, Optional<String> limit, Optional<String> fetch)


A better way to do this would be to have the analyzer record the inferred or provided limit in the query Analysis object. The planner would just use that to formulate the LimitNode and not have to deal with distinguishing between these options.

martint · 2019-04-29T22:58:14Z

presto-main/src/test/java/io/prestosql/sql/analyzer/TestAnalyzer.java

+    public void testFetchInvalidRowCount()
+    {
+        assertFails(INVALID_FETCH_ROW_COUNT, "SELECT * FROM t1 FETCH FIRST 987654321098765432109876543210 ROWS ONLY");
+        assertFails(INVALID_FETCH_ROW_COUNT, "SELECT * FROM T1 FETCH FIRST 0 ROWS ONLY");


lower case "t1" for consistency with other usages.

martint · 2019-04-29T23:06:14Z

presto-parser/src/main/java/io/prestosql/sql/tree/Query.java

@@ -29,43 +29,49 @@
    private final QueryBody queryBody;
    private final Optional<OrderBy> orderBy;
    private final Optional<String> limit;
+    private final Optional<String> fetch;


This approach makes it theoretically possible to construct syntactically illegal queries (i.e., with both LIMIT and FETCH FIRST clauses, potentially conflicting). A better approach is to replace the Optional<String> limit with an Optional<Node> limit and fill it in with either a Limit or a FetchFirst AST node (both of which need to be introduced).

This would also make the analysis more regular -- it just needs to delegate to method to analyze the corresponding node type (there'd be two of them), each of which would be responsible for ensuring the values are legal, etc., and record the inferred limit in the Analysis object to be used later by the planner.

Please notice that a query mixing LIMIT and FETCH FIRST clauses wouldn't pass through the parser. Despite that I agree that they should share a field in Query / QuerySpecification, as they are mutually exclusive.
Should I introduce a common superclass for Limit and FetchFirst nodes? Or maybe for future Offset node too?

Should I introduce a common superclass for Limit and FetchFirst

If we can come up with a reasonable name for it, yes -- there may not be a good generic concept that describes both. Otherwise, it's fine to keep as Node for now and maybe do a runtime check to ensure it's one of the supported types.

kasiafi · 2019-05-01T22:09:37Z

Applied comments.
Works % some formatting issue.

kasiafi · 2019-05-02T09:29:04Z

No more formatting issues.

martint

A few more comments. It's getting there. Good stuff!

martint · 2019-05-02T15:25:48Z

presto-main/src/main/java/io/prestosql/sql/analyzer/Analysis.java

@@ -316,6 +317,16 @@ public void setOrderByExpressions(Node node, List<Expression> items)
        return orderByExpressions.get(NodeRef.of(node));
    }

+    public void setLimit(Node node, Long rowCount)


rowCount can be primitive long

martint · 2019-05-02T15:28:11Z

presto-main/src/main/java/io/prestosql/sql/analyzer/Analysis.java

+        limit.put(NodeRef.of(node), rowCount);
+    }
+
+    public Long getLimit(Node node)


return primitive long and maybe check that the an entry exists in the limit map to avoid NPE in the off chance that there's a bug in the analyzer.

I can't figure out how to store LIMIT ALL with primitive long.
With Long I used null value, but I suppose Optional<Long> would be the neatest solution with Optional.empty for LIMIT ALL.

That seems reasonable, but I'd use OptionalLong, instead.

@martint i was under impression that we prefer Optional<Long> over OptionalLong unless there is compelling (a.k.a. performance) reason to use the latter.
prestodb/presto#11302 (comment) & prestodb/presto#11302 (comment)

I guess it depends on how it’s being used. This is very localized. No strong preference either way in this case, so Optional is ok, too.

martint · 2019-05-02T15:32:11Z

presto-main/src/main/java/io/prestosql/sql/analyzer/StatementAnalyzer.java

@@ -2113,6 +2125,65 @@ private void verifySelectDistinct(QuerySpecification node, List<Expression> outp
            return orderByFields;
        }

+        private void analyzeLimit(Query node)


These two methods are unnecessary. Aside from delegating to analyzeLimit(node, clause) they only check to make sure the limit is present. If you inline them, the check is unnecessary, as that's ensured by the if condition in the methods above.

martint · 2019-05-02T15:33:49Z

presto-main/src/main/java/io/prestosql/sql/analyzer/StatementAnalyzer.java

+
+        private void analyzeLimit(Node node, Node fetchFirstOrLimit)
+        {
+            checkState(fetchFirstOrLimit instanceof FetchFirst || fetchFirstOrLimit instanceof Limit, "Invalid limit node type. Expected: FetchFirst or Limit. Actual: %s", fetchFirstOrLimit.getClass());


Use fetchFirstOrLimit.getClass().getName(). getClass() returns a class object, whose toString() method outputs something like "class x.y.Z"

martint · 2019-05-02T15:36:46Z

presto-parser/src/main/java/io/prestosql/sql/tree/Limit.java

+public class Limit
+        extends Node
+{
+    private final Optional<String> limit;


limit should never be empty(), right? That would syntactically invalid.

Yes, and at first I thought of private final String limit;, but I decided to use Optional, because originally there was Optional<String> limit field in Query and QuerySpecification.
Shall I add a check in the constructor?

Let's just make it a private final String limit;. It was Optional<String> in Query/QuerySpecification to indicate that the clause could be missing. If there's a Limit node, the clause is present, so no need to have another level of Optional that will never be empty. The constructor should not take an Optional, either. It should be up to the caller to pass in the required value.

martint · 2019-05-02T15:42:05Z

presto-main/src/main/java/io/prestosql/sql/analyzer/StatementAnalyzer.java

+                    rowCount = Long.parseLong(fetchFirst.getRowCount().get());
+                }
+                catch (NumberFormatException e) {
+                    throw new SemanticException(INVALID_FETCH_FIRST_ROW_COUNT, node, "%s", fetchFirst.getRowCount().get());


The node in the exceptions should be the fetchFirst (or limit in the next method). That's used for generating the appropriate location in the error message.

martint · 2019-05-02T15:45:17Z

presto-main/src/main/java/io/prestosql/sql/planner/QueryPlanner.java

    }

-    private PlanBuilder sort(PlanBuilder subPlan, Optional<OrderBy> orderBy, Optional<String> limit, List<Expression> orderByExpressions)
+    private PlanBuilder sort(PlanBuilder subPlan, Optional<OrderBy> orderBy, Long limit, List<Expression> orderByExpressions)


If you associate the limit value in the analysis with the node for the clause (FetchFirst or Limit) instead of the Query/QuerySpecification, this method can take an Optional<Node> limit and the code below can just say:

if (limit.isPresent()) { planNode = new TopNNode(..., analysis.getLimit(limit.get()), ...); }

martint · 2019-05-02T15:46:02Z

presto-main/src/main/java/io/prestosql/sql/planner/QueryPlanner.java

    }

-    private PlanBuilder limit(PlanBuilder subPlan, Optional<OrderBy> orderBy, Optional<String> limit)
+    private PlanBuilder limit(PlanBuilder subPlan, Optional<OrderBy> orderBy, Long limit)


Same comment as for the sort(...) method above

martint · 2019-05-02T15:49:06Z

presto-main/src/main/java/io/prestosql/sql/rewrite/DescribeInputRewrite.java

@@ -113,10 +114,10 @@ protected Node visitDescribeInput(DescribeInput node, Void context)

            // return the positions and types of all parameters
            Row[] rows = parameters.stream().map(parameter -> createDescribeInputRow(parameter, analysis)).toArray(Row[]::new);
-            Optional<String> limit = Optional.empty();
+            Optional<String> limitRowCount = Optional.of("ALL");


Instead of generating a LIMIT ALL, use an Optional<Node> limit = Optional.empty() and assign Optional.of(new Limit("0")) in the if () block below.

martint · 2019-05-02T15:49:43Z

presto-main/src/main/java/io/prestosql/sql/rewrite/DescribeOutputRewrite.java

@@ -103,12 +104,12 @@ protected Node visitDescribeOutput(DescribeOutput node, Void context)
            Analyzer analyzer = new Analyzer(session, metadata, parser, accessControl, queryExplainer, parameters, warningCollector);
            Analysis analysis = analyzer.analyze(statement, true);

-            Optional<String> limit = Optional.empty();
+            Optional<String> limitRowCount = Optional.of("ALL");


Same comment as in visitDescribeInput above

kasiafi · 2019-05-03T14:15:59Z

Applied comments.

martint · 2019-05-03T16:01:41Z

Merged, thanks!

cla-bot bot added the cla-signed label Apr 25, 2019

kasiafi force-pushed the OffsetFetch branch from c87318a to 8a296f0 Compare April 25, 2019 09:21

martint reviewed Apr 25, 2019

View reviewed changes

kasiafi force-pushed the OffsetFetch branch from 8a296f0 to 2e415c7 Compare April 28, 2019 13:54

martint self-requested a review April 29, 2019 19:47

martint reviewed Apr 29, 2019

View reviewed changes

kasiafi force-pushed the OffsetFetch branch 2 times, most recently from 11353a1 to 3a6d651 Compare May 1, 2019 20:46

kasiafi force-pushed the OffsetFetch branch 2 times, most recently from bc5e7dc to 0a94d2c Compare May 2, 2019 06:58

martint requested changes May 2, 2019

View reviewed changes

Add support for FETCH syntax

a9e7145

kasiafi force-pushed the OffsetFetch branch from 0a94d2c to a9e7145 Compare May 3, 2019 13:04

martint approved these changes May 3, 2019

View reviewed changes

martint merged commit 9eea777 into trinodb:master May 3, 2019

martint mentioned this pull request May 3, 2019

Release notes for 310 #678

Closed

6 tasks

electrum added this to the 310 milestone May 3, 2019

fgwang7w mentioned this pull request Jul 22, 2020

Add support for FETCH clause and FETCH FIRST WITH TIES clause prestodb/presto#14871

Closed

Add support for FETCH syntax #666

Add support for FETCH syntax #666

Conversation

kasiafi commented Apr 25, 2019 • edited

martint left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kasiafi commented Apr 25, 2019

martint commented Apr 25, 2019

kasiafi commented Apr 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kasiafi Apr 30, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kasiafi commented May 1, 2019

kasiafi commented May 2, 2019

martint left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kasiafi commented May 3, 2019

martint commented May 3, 2019

kasiafi commented Apr 25, 2019 •

edited

kasiafi Apr 30, 2019 •

edited