New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for FETCH syntax #666
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start. Let's add the corresponding AST node and a semantic check that fails with "unsupported" error. Otherwise, it will parse queries successfully and ignore the clause, which amounts to producing incorrect results.
if (context.FETCH() != null) { | ||
limit = Optional.of(limit.orElse("1")); // default FETCH row count is 1 | ||
long limitValue = Long.parseLong(limit.get()); | ||
check(limitValue != 0, "Invalid row count in FETCH FIRST clause", context.limit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a semantic check that should be deferred until analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantic difference between LIMIT and FETCH is whether they allow row count equal to 0.
In my approach, in the course of processing the query, FETCH and LIMIT are distinguishable only as far as AstBuilder
. That is why I put the semantic check (row count > 0) there.
To avoid that, LIMIT and FETCH could be carried distinctly until analysis which would enable FETCH-specific check there, and glued together afterwards. I consider this over-complicating, as they are essentially the same.
Or I could define a new token in grammar like POSITIVE_INTEGER
and let parser do the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that works until we add percentage and WITH TIES.
Also, technically, the number can be arbitrarily large. The spec requires it to be "an exact numeric type with scale 0" (for rows -- for percent it can be any numeric type). So Long.parseLong
might not be sufficient. Incidentally, specifying a larger value will fail with a NumberFormatException
. Validating this in the analyzer allows us to provide better error messages and error codes.
Another issue is that parsing/formatting doesn't roundtrip. If you have a query with "FETCH FIRST x ROWS" it will be printed as "LIMIT x" by the SqlFormatter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see that there is more to analyse than just row count greater than zero or not. So the right approach is to keep LIMIT and FETCH distinct until analysis.
I think I'll keep them separate until QueryPlanner
and there produce a LimitNode
for a query with fetch
.
There are no incorrect results, as I'm using the existing support for LIMIT so that it takes care of FETCH too. |
Ah, that's right. I overlooked the assigment to the |
Applied distinct |
throw new SemanticException(INVALID_FETCH_ROW_COUNT, node, "%s", fetch); | ||
} | ||
if (fetchValue <= 0) { | ||
throw new SemanticException(INVALID_FETCH_ROW_COUNT, node, "FETCH row count must be positive (actual value: %s)", fetchValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature is officially called "FETCH FIRST" in the specification, so let's refer to it that way in error messages, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a side note: user's query can use FETCH NEXT
syntax. Does mentioning FETCH FIRST
in error message be helpful?
Maybe FETCH FIRST/NEXT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @martint 's comment was about complying with the official naming and not about trying to recall in the error message how the actual query looked like.
Spec mentions "fetch first clause", "fetch first quantity", so probably the full name is "fetch first".
} | ||
|
||
private PlanBuilder limit(PlanBuilder subPlan, Optional<OrderBy> orderBy, Optional<String> limit) | ||
private PlanBuilder limit(PlanBuilder subPlan, Optional<OrderBy> orderBy, Optional<String> limit, Optional<String> fetch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better way to do this would be to have the analyzer record the inferred or provided limit in the query Analysis object. The planner would just use that to formulate the LimitNode and not have to deal with distinguishing between these options.
public void testFetchInvalidRowCount() | ||
{ | ||
assertFails(INVALID_FETCH_ROW_COUNT, "SELECT * FROM t1 FETCH FIRST 987654321098765432109876543210 ROWS ONLY"); | ||
assertFails(INVALID_FETCH_ROW_COUNT, "SELECT * FROM T1 FETCH FIRST 0 ROWS ONLY"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lower case "t1" for consistency with other usages.
@@ -29,43 +29,49 @@ | |||
private final QueryBody queryBody; | |||
private final Optional<OrderBy> orderBy; | |||
private final Optional<String> limit; | |||
private final Optional<String> fetch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach makes it theoretically possible to construct syntactically illegal queries (i.e., with both LIMIT and FETCH FIRST clauses, potentially conflicting). A better approach is to replace the Optional<String> limit
with an Optional<Node> limit
and fill it in with either a Limit
or a FetchFirst
AST node (both of which need to be introduced).
This would also make the analysis more regular -- it just needs to delegate to method to analyze the corresponding node type (there'd be two of them), each of which would be responsible for ensuring the values are legal, etc., and record the inferred limit in the Analysis object to be used later by the planner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please notice that a query mixing LIMIT and FETCH FIRST clauses wouldn't pass through the parser. Despite that I agree that they should share a field in Query
/ QuerySpecification
, as they are mutually exclusive.
Should I introduce a common superclass for Limit
and FetchFirst
nodes? Or maybe for future Offset
node too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I introduce a common superclass for Limit and FetchFirst
If we can come up with a reasonable name for it, yes -- there may not be a good generic concept that describes both. Otherwise, it's fine to keep as Node for now and maybe do a runtime check to ensure it's one of the supported types.
11353a1
to
3a6d651
Compare
Applied comments. |
bc5e7dc
to
0a94d2c
Compare
No more formatting issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments. It's getting there. Good stuff!
@@ -316,6 +317,16 @@ public void setOrderByExpressions(Node node, List<Expression> items) | |||
return orderByExpressions.get(NodeRef.of(node)); | |||
} | |||
|
|||
public void setLimit(Node node, Long rowCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rowCount can be primitive long
limit.put(NodeRef.of(node), rowCount); | ||
} | ||
|
||
public Long getLimit(Node node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return primitive long
and maybe check that the an entry exists in the limit
map to avoid NPE in the off chance that there's a bug in the analyzer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't figure out how to store LIMIT ALL with primitive long
.
With Long
I used null value, but I suppose Optional<Long>
would be the neatest solution with Optional.empty
for LIMIT ALL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable, but I'd use OptionalLong
, instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martint i was under impression that we prefer Optional<Long>
over OptionalLong
unless there is compelling (a.k.a. performance) reason to use the latter.
prestodb/presto#11302 (comment) & prestodb/presto#11302 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on how it’s being used. This is very localized. No strong preference either way in this case, so Optional is ok, too.
@@ -2113,6 +2125,65 @@ private void verifySelectDistinct(QuerySpecification node, List<Expression> outp | |||
return orderByFields; | |||
} | |||
|
|||
private void analyzeLimit(Query node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two methods are unnecessary. Aside from delegating to analyzeLimit(node, clause)
they only check to make sure the limit is present. If you inline them, the check is unnecessary, as that's ensured by the if
condition in the methods above.
|
||
private void analyzeLimit(Node node, Node fetchFirstOrLimit) | ||
{ | ||
checkState(fetchFirstOrLimit instanceof FetchFirst || fetchFirstOrLimit instanceof Limit, "Invalid limit node type. Expected: FetchFirst or Limit. Actual: %s", fetchFirstOrLimit.getClass()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use fetchFirstOrLimit.getClass().getName()
. getClass()
returns a class object, whose toString()
method outputs something like "class x.y.Z"
public class Limit | ||
extends Node | ||
{ | ||
private final Optional<String> limit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limit
should never be empty()
, right? That would syntactically invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and at first I thought of private final String limit;
, but I decided to use Optional
, because originally there was Optional<String> limit
field in Query
and QuerySpecification
.
Shall I add a check in the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just make it a private final String limit;
. It was Optional<String>
in Query
/QuerySpecification
to indicate that the clause could be missing. If there's a Limit
node, the clause is present, so no need to have another level of Optional
that will never be empty
. The constructor should not take an Optional
, either. It should be up to the caller to pass in the required value.
rowCount = Long.parseLong(fetchFirst.getRowCount().get()); | ||
} | ||
catch (NumberFormatException e) { | ||
throw new SemanticException(INVALID_FETCH_FIRST_ROW_COUNT, node, "%s", fetchFirst.getRowCount().get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The node in the exceptions should be the fetchFirst
(or limit
in the next method). That's used for generating the appropriate location in the error message.
} | ||
|
||
private PlanBuilder sort(PlanBuilder subPlan, Optional<OrderBy> orderBy, Optional<String> limit, List<Expression> orderByExpressions) | ||
private PlanBuilder sort(PlanBuilder subPlan, Optional<OrderBy> orderBy, Long limit, List<Expression> orderByExpressions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you associate the limit value in the analysis with the node for the clause (FetchFirst or Limit) instead of the Query/QuerySpecification, this method can take an Optional<Node> limit
and the code below can just say:
if (limit.isPresent()) {
planNode = new TopNNode(..., analysis.getLimit(limit.get()), ...);
}
} | ||
|
||
private PlanBuilder limit(PlanBuilder subPlan, Optional<OrderBy> orderBy, Optional<String> limit) | ||
private PlanBuilder limit(PlanBuilder subPlan, Optional<OrderBy> orderBy, Long limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for the sort(...)
method above
@@ -113,10 +114,10 @@ protected Node visitDescribeInput(DescribeInput node, Void context) | |||
|
|||
// return the positions and types of all parameters | |||
Row[] rows = parameters.stream().map(parameter -> createDescribeInputRow(parameter, analysis)).toArray(Row[]::new); | |||
Optional<String> limit = Optional.empty(); | |||
Optional<String> limitRowCount = Optional.of("ALL"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of generating a LIMIT ALL
, use an Optional<Node> limit = Optional.empty()
and assign Optional.of(new Limit("0"))
in the if ()
block below.
@@ -103,12 +104,12 @@ protected Node visitDescribeOutput(DescribeOutput node, Void context) | |||
Analyzer analyzer = new Analyzer(session, metadata, parser, accessControl, queryExplainer, parameters, warningCollector); | |||
Analysis analysis = analyzer.analyze(statement, true); | |||
|
|||
Optional<String> limit = Optional.empty(); | |||
Optional<String> limitRowCount = Optional.of("ALL"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in visitDescribeInput
above
Applied comments. |
Merged, thanks! |
Relates to #1