Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for FETCH syntax #666

Merged
merged 1 commit into from May 3, 2019
Merged

Conversation

kasiafi
Copy link
Member

@kasiafi kasiafi commented Apr 25, 2019

Relates to #1

Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start. Let's add the corresponding AST node and a semantic check that fails with "unsupported" error. Otherwise, it will parse queries successfully and ignore the clause, which amounts to producing incorrect results.

if (context.FETCH() != null) {
limit = Optional.of(limit.orElse("1")); // default FETCH row count is 1
long limitValue = Long.parseLong(limit.get());
check(limitValue != 0, "Invalid row count in FETCH FIRST clause", context.limit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a semantic check that should be deferred until analysis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic difference between LIMIT and FETCH is whether they allow row count equal to 0.

In my approach, in the course of processing the query, FETCH and LIMIT are distinguishable only as far as AstBuilder. That is why I put the semantic check (row count > 0) there.

To avoid that, LIMIT and FETCH could be carried distinctly until analysis which would enable FETCH-specific check there, and glued together afterwards. I consider this over-complicating, as they are essentially the same.

Or I could define a new token in grammar like POSITIVE_INTEGER and let parser do the check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that works until we add percentage and WITH TIES.

Also, technically, the number can be arbitrarily large. The spec requires it to be "an exact numeric type with scale 0" (for rows -- for percent it can be any numeric type). So Long.parseLong might not be sufficient. Incidentally, specifying a larger value will fail with a NumberFormatException. Validating this in the analyzer allows us to provide better error messages and error codes.

Another issue is that parsing/formatting doesn't roundtrip. If you have a query with "FETCH FIRST x ROWS" it will be printed as "LIMIT x" by the SqlFormatter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see that there is more to analyse than just row count greater than zero or not. So the right approach is to keep LIMIT and FETCH distinct until analysis.
I think I'll keep them separate until QueryPlanner and there produce a LimitNode for a query with fetch.

@kasiafi
Copy link
Member Author

kasiafi commented Apr 25, 2019

This is a good start. Let's add the corresponding AST node and a semantic check that fails with "unsupported" error. Otherwise, it will parse queries successfully and ignore the clause, which amounts to producing incorrect results.

There are no incorrect results, as I'm using the existing support for LIMIT so that it takes care of FETCH too.
This is why I use the same limit token in grammar to capture the row count.

@martint
Copy link
Member

martint commented Apr 25, 2019

There are no incorrect results, as I'm using the existing support for LIMIT so that it takes care of FETCH too.

Ah, that's right. I overlooked the assigment to the limit variable

@kasiafi
Copy link
Member Author

kasiafi commented Apr 28, 2019

Applied distinct FETCH handling up to LogicalPlanner (Implementation for LIMIT is used from there on)
plus tests
plus SqlFormatter change

@martint martint self-requested a review April 29, 2019 19:47
throw new SemanticException(INVALID_FETCH_ROW_COUNT, node, "%s", fetch);
}
if (fetchValue <= 0) {
throw new SemanticException(INVALID_FETCH_ROW_COUNT, node, "FETCH row count must be positive (actual value: %s)", fetchValue);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature is officially called "FETCH FIRST" in the specification, so let's refer to it that way in error messages, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a side note: user's query can use FETCH NEXT syntax. Does mentioning FETCH FIRST in error message be helpful?
Maybe FETCH FIRST/NEXT?

Copy link
Member Author

@kasiafi kasiafi Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @martint 's comment was about complying with the official naming and not about trying to recall in the error message how the actual query looked like.
Spec mentions "fetch first clause", "fetch first quantity", so probably the full name is "fetch first".

}

private PlanBuilder limit(PlanBuilder subPlan, Optional<OrderBy> orderBy, Optional<String> limit)
private PlanBuilder limit(PlanBuilder subPlan, Optional<OrderBy> orderBy, Optional<String> limit, Optional<String> fetch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better way to do this would be to have the analyzer record the inferred or provided limit in the query Analysis object. The planner would just use that to formulate the LimitNode and not have to deal with distinguishing between these options.

public void testFetchInvalidRowCount()
{
assertFails(INVALID_FETCH_ROW_COUNT, "SELECT * FROM t1 FETCH FIRST 987654321098765432109876543210 ROWS ONLY");
assertFails(INVALID_FETCH_ROW_COUNT, "SELECT * FROM T1 FETCH FIRST 0 ROWS ONLY");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower case "t1" for consistency with other usages.

@@ -29,43 +29,49 @@
private final QueryBody queryBody;
private final Optional<OrderBy> orderBy;
private final Optional<String> limit;
private final Optional<String> fetch;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach makes it theoretically possible to construct syntactically illegal queries (i.e., with both LIMIT and FETCH FIRST clauses, potentially conflicting). A better approach is to replace the Optional<String> limit with an Optional<Node> limit and fill it in with either a Limit or a FetchFirst AST node (both of which need to be introduced).

This would also make the analysis more regular -- it just needs to delegate to method to analyze the corresponding node type (there'd be two of them), each of which would be responsible for ensuring the values are legal, etc., and record the inferred limit in the Analysis object to be used later by the planner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please notice that a query mixing LIMIT and FETCH FIRST clauses wouldn't pass through the parser. Despite that I agree that they should share a field in Query / QuerySpecification, as they are mutually exclusive.
Should I introduce a common superclass for Limit and FetchFirst nodes? Or maybe for future Offset node too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I introduce a common superclass for Limit and FetchFirst

If we can come up with a reasonable name for it, yes -- there may not be a good generic concept that describes both. Otherwise, it's fine to keep as Node for now and maybe do a runtime check to ensure it's one of the supported types.

@kasiafi kasiafi force-pushed the OffsetFetch branch 2 times, most recently from 11353a1 to 3a6d651 Compare May 1, 2019 20:46
@kasiafi
Copy link
Member Author

kasiafi commented May 1, 2019

Applied comments.
Works % some formatting issue.

@kasiafi kasiafi force-pushed the OffsetFetch branch 2 times, most recently from bc5e7dc to 0a94d2c Compare May 2, 2019 06:58
@kasiafi
Copy link
Member Author

kasiafi commented May 2, 2019

No more formatting issues.

Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments. It's getting there. Good stuff!

@@ -316,6 +317,16 @@ public void setOrderByExpressions(Node node, List<Expression> items)
return orderByExpressions.get(NodeRef.of(node));
}

public void setLimit(Node node, Long rowCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rowCount can be primitive long

limit.put(NodeRef.of(node), rowCount);
}

public Long getLimit(Node node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return primitive long and maybe check that the an entry exists in the limit map to avoid NPE in the off chance that there's a bug in the analyzer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't figure out how to store LIMIT ALL with primitive long.
With Long I used null value, but I suppose Optional<Long> would be the neatest solution with Optional.empty for LIMIT ALL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable, but I'd use OptionalLong, instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint i was under impression that we prefer Optional<Long> over OptionalLong unless there is compelling (a.k.a. performance) reason to use the latter.
prestodb/presto#11302 (comment) & prestodb/presto#11302 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on how it’s being used. This is very localized. No strong preference either way in this case, so Optional is ok, too.

@@ -2113,6 +2125,65 @@ private void verifySelectDistinct(QuerySpecification node, List<Expression> outp
return orderByFields;
}

private void analyzeLimit(Query node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two methods are unnecessary. Aside from delegating to analyzeLimit(node, clause) they only check to make sure the limit is present. If you inline them, the check is unnecessary, as that's ensured by the if condition in the methods above.


private void analyzeLimit(Node node, Node fetchFirstOrLimit)
{
checkState(fetchFirstOrLimit instanceof FetchFirst || fetchFirstOrLimit instanceof Limit, "Invalid limit node type. Expected: FetchFirst or Limit. Actual: %s", fetchFirstOrLimit.getClass());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use fetchFirstOrLimit.getClass().getName(). getClass() returns a class object, whose toString() method outputs something like "class x.y.Z"

public class Limit
extends Node
{
private final Optional<String> limit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit should never be empty(), right? That would syntactically invalid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and at first I thought of private final String limit;, but I decided to use Optional, because originally there was Optional<String> limit field in Query and QuerySpecification.
Shall I add a check in the constructor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just make it a private final String limit;. It was Optional<String> in Query/QuerySpecification to indicate that the clause could be missing. If there's a Limit node, the clause is present, so no need to have another level of Optional that will never be empty. The constructor should not take an Optional, either. It should be up to the caller to pass in the required value.

rowCount = Long.parseLong(fetchFirst.getRowCount().get());
}
catch (NumberFormatException e) {
throw new SemanticException(INVALID_FETCH_FIRST_ROW_COUNT, node, "%s", fetchFirst.getRowCount().get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node in the exceptions should be the fetchFirst (or limit in the next method). That's used for generating the appropriate location in the error message.

}

private PlanBuilder sort(PlanBuilder subPlan, Optional<OrderBy> orderBy, Optional<String> limit, List<Expression> orderByExpressions)
private PlanBuilder sort(PlanBuilder subPlan, Optional<OrderBy> orderBy, Long limit, List<Expression> orderByExpressions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you associate the limit value in the analysis with the node for the clause (FetchFirst or Limit) instead of the Query/QuerySpecification, this method can take an Optional<Node> limit and the code below can just say:

if (limit.isPresent()) {
    planNode = new TopNNode(..., analysis.getLimit(limit.get()), ...);
}

}

private PlanBuilder limit(PlanBuilder subPlan, Optional<OrderBy> orderBy, Optional<String> limit)
private PlanBuilder limit(PlanBuilder subPlan, Optional<OrderBy> orderBy, Long limit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for the sort(...) method above

@@ -113,10 +114,10 @@ protected Node visitDescribeInput(DescribeInput node, Void context)

// return the positions and types of all parameters
Row[] rows = parameters.stream().map(parameter -> createDescribeInputRow(parameter, analysis)).toArray(Row[]::new);
Optional<String> limit = Optional.empty();
Optional<String> limitRowCount = Optional.of("ALL");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of generating a LIMIT ALL, use an Optional<Node> limit = Optional.empty() and assign Optional.of(new Limit("0")) in the if () block below.

@@ -103,12 +104,12 @@ protected Node visitDescribeOutput(DescribeOutput node, Void context)
Analyzer analyzer = new Analyzer(session, metadata, parser, accessControl, queryExplainer, parameters, warningCollector);
Analysis analysis = analyzer.analyze(statement, true);

Optional<String> limit = Optional.empty();
Optional<String> limitRowCount = Optional.of("ALL");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in visitDescribeInput above

@kasiafi
Copy link
Member Author

kasiafi commented May 3, 2019

Applied comments.

@martint martint merged commit 9eea777 into trinodb:master May 3, 2019
@martint
Copy link
Member

martint commented May 3, 2019

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants