Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser fails recognising set operations in correlated subqueries #918

Closed
lukaseder opened this issue Mar 1, 2018 · 9 comments · Fixed by #3317
Closed

Parser fails recognising set operations in correlated subqueries #918

lukaseder opened this issue Mar 1, 2018 · 9 comments · Fixed by #3317

Comments

@lukaseder
Copy link
Contributor

The following query runs fine in PostgreSQL:

select ((select 1 x) except (select 1 y)) t;

H2 fails to parse it with the following exception:

Syntax error in SQL statement "SELECT ((SELECT 1 X) EXCEPT[*] (SELECT 1 Y)) T "; expected "[, ::, *, /, %, +, -, ||, ~, !~, NOT, LIKE, ILIKE, REGEXP, IS, IN, BETWEEN, AND, OR, ,, )"; SQL statement:
select ((select 1 x) except (select 1 y)) t [42001-196]

As a workaround, this works:

select (select 1 x except (select 1 y)) t;

But if ORDER BY .. LIMIT would be required in the first set operation subquery, then there wouldn't be a reasonable solution, as the parentheses around that first subquery are now mandatory:

select ((select 1 x order by 1 limit 1) except (select 1 y)) t;
@lukaseder
Copy link
Contributor Author

I'm really curious how this could be fixed in a recursive descent parser, @thomasmueller / @grandinj :-) There are so many edge cases, including:

Parentheses can wrap a sum:

select (((select 1) union (select 1)) + 1)

Parentheses can wrap a row value expression (array in H2):

select (((select 1) union (select 1)), 1)

The problem is that this call to readExpression():
https://github.com/h2database/h2database/blob/version-1.4.196/h2/src/main/org/h2/command/Parser.java#L2968

Is already inside of some parentheses that really belongs to a select statement, not to an expression. In other words:

select (    ((select 1) union (select 1)) + 1    )
--     ^    ^^
--     |    ++--- Parentheses belonging to a select
--     +---- Parentheses belonging to an expression

@grandinj
Copy link
Contributor

Usually, a recursive descent parser either uses context or look-ahead to solve the problem.

What does the stack above readTerm look like?

@grandinj
Copy link
Contributor

I suspect here we will need a new method readExpressionOrSelect, which keeps peeling off parentheses until it hits something that resolves the ambiguity, we have similar methods elsewhere in that class

@lukaseder
Copy link
Contributor Author

lukaseder commented Mar 14, 2018

@grandinj

This is the stack when I'm reaching the following position:

SELECT (((SELECT 1) UNION[*] (SELECT 1)) + 1) 

Stack:

Parser.readTerm() line: 2987 [local variables unavailable]	
Parser.readFactor() line: 2390 [local variables unavailable]	
Parser.readSum() line: 2377 [local variables unavailable]	
Parser.readConcat() line: 2347 [local variables unavailable]	
Parser.readCondition() line: 2178 [local variables unavailable]	
Parser.readAnd() line: 2150 [local variables unavailable]	
Parser.readExpression() line: 2142 [local variables unavailable]	
Parser.readTerm() line: 2968 [local variables unavailable]	
Parser.readFactor() line: 2390	
Parser.readSum() line: 2377	
Parser.readConcat() line: 2347	
Parser.readCondition() line: 2178	
Parser.readAnd() line: 2150	
Parser.readExpression() line: 2142	
Parser.readTerm() line: 2968 [local variables unavailable]	
Parser.readFactor() line: 2390	
Parser.readSum() line: 2377	
Parser.readConcat() line: 2347	
Parser.readCondition() line: 2178	
Parser.readAnd() line: 2150	
Parser.readExpression() line: 2142	
Parser.parseSelectSimpleSelectPart(Select) line: 2053	
Parser.parseSelectSimple() line: 2085	
Parser.parseSelectSub() line: 1940	
Parser.parseSelectUnion() line: 1755	
Parser.parseSelect() line: 1743	
Parser.parsePrepared() line: 449	
Parser.parse(String, boolean) line: 321	
Parser.parse(String) line: 293	
Parser.prepareCommand(String) line: 258	
Session.prepareLocal(String) line: 578	
Session.prepareCommand(String, int) line: 519	
JdbcConnection.prepareCommand(String, int) line: 1204	
JdbcStatement.executeQuery(String) line: 73	
H2.main(String[]) line: 59	

And then, the parser fails because it expects a closing parenthesis rather than a UNION operation.

I suspect here we will need a new method readExpressionOrSelect, which keeps peeling off parentheses until it hits something that resolves the ambiguity, we have similar methods elsewhere in that class

I'm just doing the same thing in jOOQ's parser right now, which solves the problem. Whenever I encounter a ( token, I look ahead all the ( tokens and whitespace to see if the next token is the SELECT or WITH keyword. If it is, I try to parse a SELECT statement (including UNION and ORDER BY etc.). If that fails with an exception, I move on as normally, treating the ( token as the start of an expression.

I'm not too fond of this solution as it isn't linear and might take quite some time when parsing something extreme...

@lukaseder
Copy link
Contributor Author

If you're interested, this is a short program helping to reproduce this:

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class H2 {
    public static void main(String[] args) throws Exception {
        Class.forName("org.h2.Driver");

        String url = "jdbc:h2:~/test";
        String user = "sa";
        String password = "";

        try (Connection c = DriverManager.getConnection(url, user, password);
             Statement s = c.createStatement();
             ResultSet rs = s.executeQuery("select (((select 1) union (select 1)) + 1)")) {

            while (rs.next())
                System.out.println(rs.getString(1));
        }
    }
}

@grandinj
Copy link
Contributor

Hmmm, it's the brackets that are confusing us.

Both
select (select 1 x except select 1 y) t;
and
select ((select 1 union select 1) + 1)

work for me. Not sure how to fix it.

@lukaseder
Copy link
Contributor Author

As you mentioned here: #918 (comment)

Essentially, you'll have to "look ahead" inside of readTerm(), where the OPEN token is consumed. You're currently correctly handling:

  • Empty parens (H2-specific arrays)
  • Scalar expressions (without subqueries), with potentially nested parens
  • H2-specific arrays (which are really row value expressions / tuples in other databases)

The way I solved it for jOOQ is before consuming the opening paren, I'm peeking into all the immediately subsequent parens for a SELECT (or WITH) keyword, which indicates that there's a subquery somewhere. If that's the case, then I try to parse an entire SELECT, including UNION / INTERSECT / EXCEPT, etc. If that succeeds, great. If that fails with an error indicating a missing closing ), then I ignore that error and try parsing ordinary expressions.

Example for:

  1. select (select 1 x except select 1 y) t
  2. select ((select 1 union select 1) + 1)
  3. select ((select 1 x) except (select 1 y)) t

Pseudo parse stack:

1

  • Parse select
    • Parse projection expression
      • Parse parenthesised expression
        • Peek select keyword: true
        • Try parsing complete select query. This includes the parentheses and parses everything before the t alias
      • Parse alias

2

  • Parse select
    • Parse projection expression
      • Parse parenthesised expression (first ()
        • Peek select keyword: true
        • Try parsing complete select query. This fails at the + token, which is invalid at that position
        • Parse ordinary expression
          • Parse sum
            • Parse parenthesised expression (second ()
              • Peek select keyword: true
              • Try parsing complete select query. This works and parses everything before the + sign
            • Parse + token
            • Parse 1

3

  • Parse select
    • Parse projection expression
      • Parse parenthesised expression (first ()
        • Peek select keyword: true
        • Try parsing complete select query. This works and parses everything before the t alias
      • Parse alias

I'm not sure if this is the most efficient way to parse this kind of syntax as we need to look ahead and try parsing a subquery until it works, or "peel off" those parentheses, assuming ordinary parenthesised expressions, until we do find a subquery. But the overhead should be manageable given that this is probably not a very frequent edge case.

@ohadpinch
Copy link

I don't see anyone assign for that bug, can you please suggest a workaround for my use case? #2050

@katzyn
Copy link
Contributor

katzyn commented Aug 6, 2019

You need to rewrite a query to make EXISTS or UNION a part of a SELECT.

SELECT … UNION | EXISTS … works. (some_query) UNION | EXISTS … is not yet supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants