Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ugly cql syntax error messages #1703

Open
nyh opened this issue Sep 26, 2016 · 25 comments
Open

Ugly cql syntax error messages #1703

nyh opened this issue Sep 26, 2016 · 25 comments
Assignees
Labels
area/cql symptom/ux Concerns regarding the user experience in working with Scylla. type/bug user request
Milestone

Comments

@nyh
Copy link
Contributor

nyh commented Sep 26, 2016

If the CQL syntax requires a certain keyword, but it's not there, we get a weird error. For example consider that the correct syntax is:

CREATE MATERIALIZED VIEW building_by_city AS SELECT * FROM buildings WHERE city IS NOT NULL  PRIMARY KEY(city, name);

Note, that the word "VIEW" is mandatory - there is no materialized anything-else. If you try to use the word "DOG" instead,

CREATE MATERIALIZED DOG building_by_city AS SELECT * FROM buildings WHERE city IS NOT NULL  PRIMARY KEY(city, name);

The error message is:

SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:20 missing K_VIEW at '<missing '">

The "missing K_VIEW" message is correct (although the user should be told about the word "VIEW", not "K_VIEW"), but the bizarre missing context message that follows is ugly.

Another example: the new IS NOT NULL syntax requires NULL - cannot be anything else. So if you try "is not 3":

SELECT * from buildings WHERE city IS NOT 3;

The result is again the ugly:

SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:42 missing K_NULL at '<missing '">

These examples use materialized views syntax, but I'm sure a non-MV syntax example can be found.

@tzach tzach added type/bug symptom/ux Concerns regarding the user experience in working with Scylla. labels Sep 27, 2016
@penberg
Copy link
Contributor

penberg commented Sep 27, 2016

@nyh So did you try the same CQL statements with Cassandra? What kind of of errors are you getting with that?

@nyh
Copy link
Contributor Author

nyh commented Sep 27, 2016

The output in today's Cassandra trunk:

cqlsh> CREATE MATERIALIZED DOG building_by_city AS SELECT * FROM buildings WHERE city IS NOT NULL  PRIMARY KEY(city, name);
SyntaxException: line 1:20 missing K_VIEW at 'DOG' (CREATE MATERIALIZED [DOG] building_by_city...)
cqlsh> SELECT * from buildings WHERE city IS NOT 3;
SyntaxException: line 1:42 mismatched input '3' expecting K_NULL (...buildings WHERE city IS NOT [3];)

As you can see it includes much more useful context on each error.

I wasn't necessarily expecting exactly the same error message, but the bizarre text "at '<missing '"> we output is definitely a bug.

@penberg
Copy link
Contributor

penberg commented Sep 28, 2016

Yes, we don't append the query snippet to the error message. There's some work to be done for that in in error_collector.

I'm not sure what the <missing... thing is.

@slivne slivne added this to the 2.0 milestone Oct 31, 2016
@nyh
Copy link
Contributor Author

nyh commented Mar 8, 2020

Just confirmed this bug (unhelpful parsing error messages) still exists in master.

@nyh nyh added the area/cql label Jun 14, 2021
@nyh
Copy link
Contributor Author

nyh commented Jun 14, 2021

Four years later, we still have this unhelpful '<missing ' context in our error messages. Someone just pasted such an error message in stackoverflow - https://stackoverflow.com/questions/67966069/cassandra-missing-at-missing.

We also have a similar complaint in #5546. Some examples from that issue:

For

SELECT * FROM users WHERE id IN (children) ALLOW FILTERING;

where the error is missing quotes on children:

SyntaxException: line 1:41 no viable alternative at input ')'

This is not very helpful. The error should be that children is wrong, not that ')' is wrong.

If we add the quotes but write one too many L's in the ALLOW FILTERING:

SELECT * FROM users WHERE id IN ('children') ALLLOW FILTERING;

the error is

SyntaxException: line 1:45  : syntax error...

which lacks any context on the error.

@nyh
Copy link
Contributor Author

nyh commented Nov 18, 2021

Another example I saw today - an incorrect query which tried to use an "OR" keyword in a way not supported in CQL:

SELECT TOKEN(k), k FROM table WHERE TOKEN(k)=-3485513579396041028 OR TOKEN(k)=8041825692956475789

The Cassandra error is helpful - it tells you the problem is the "OR":

line 1:106 no viable alternative at input 'OR' (... WHERE TOKEN(k)=-3485513579396041028 [OR]...)

The Scylla error message is just

line 1:105  : syntax error...

@nyh
Copy link
Contributor Author

nyh commented Dec 8, 2022

Another ugly example I saw today... Trying the command

ALTER TABLE tbl DROP COMPACT STORAGE

I get the unintelligible error:

line 1:71 extraneous input 'STORAGE' expecting <invalid>

The <invalid> is silly, but why did we even get an error on "STORAGE"? Is "DROP COMPACT" fine with something else, not "STORAGE"?

By the way, Cassandra recognizes this syntax, but prints a message that it's no longer supported without being separately enabled:

DROP COMPACT STORAGE is disabled. Enable in cassandra.yaml to use.

(this issue is just about the ugly error message - the fact we don't support this feature at all is a separate question - #3882).

@nyh
Copy link
Contributor Author

nyh commented Jan 26, 2023

Another ugly example I saw today:
The query

CREATE TABLE tbl  q(a int, b int, c frozen<map<int, int>>, PRIMARY KEY ((a, b)) WITH COMPACT STORAGE

Is missing a third parentheses before the "WITH" but it's hard to spot. Scylla doesn't help:

line 1:120 missing ')' at '<missing '

Again with that weird '<missing ' thing... Cassandra is more helpful:

line 1:120 mismatched input 'WITH' expecting ')' (... ((a, b)) [WITH]...)

@avikivity
Copy link
Member

We should move to antlr4. I tried before, but there's a problem with the lexer losing some capabilties, which we have to work around.

@avikivity
Copy link
Member

My work-in-progress for reference: https://github.com/avikivity/scylladb/commits/antlr4

@nyh
Copy link
Contributor Author

nyh commented Jan 26, 2023

We should move to antlr4. I tried before, but there's a problem with the lexer losing some capabilties, which we have to work around.

Why is antlr4 needed? If I understand correctly (but please correct me if I'm wrong), Cassandra also uses antl3 and managing to provide good error messages. So I thought we just have a bug in the code which produces the error messages.

@avikivity
Copy link
Member

I thought they switched to antlr4, but can't find evidence for it now.

@nyh
Copy link
Contributor Author

nyh commented Apr 27, 2023

I thought they switched to antlr4, but can't find evidence for it now.

I verified in Cassandra 4.1.1's build.xml that they use antlr 3.5.2.

I'm having a really hard time figuring out why we have much worse error messages than Casandra. The error-handling code is extremely convoluted, with the worst of all worlds of object oriented and templates, but on the other hand looks like was copied from Cassandra. I know why the nice "full context" part of the message (the parentheses at the end) is missing - we didn't translate that function to C++, but I can't figure out why the main part of the message (e.g., "mismatched input 'KE' expecting K_KEY") is missing. I'm still working on it...

@nyh
Copy link
Contributor Author

nyh commented Apr 27, 2023

The mystery deepens, and my dislike for Antlr also deepens :-( The example I'm focusing now is CREATE TABLE {test_keyspace}.{unique_name()}(a int, PRIMARY KEY((a)) WITH COMPACT STORAGE - where a third closing parentheses is missing before the WITH.
Currently Cassandra prints

line 1:83 mismatched input 'WITH' expecting ')' (... PRIMARY KEY((a)) [WITH]...)

Whereas Scylla prints

line 1:83 missing \')\' at \'<missing \'">

Three observations:

  1. The phrase "mismatched input" comes from the Java source code of Antlr, runtime/Java/src/main/java/org/antlr/runtime/BaseRecognizer.java. Many other languages have similar code in the anlr runtime directory, but... not C++.
  2. Moreover, the C++ ANTLR doesn't even recognize this error as a "mismatch" but rather a "missing token". It tells you that "(" is missing (good!) but gives no context of where it's missing.
  3. The displayRecognitionError() function in /usr/include/antlr3exception.inl (C++ Antlr's runtime is installed in /usr/include...), prints just the missing "(". It's a short, but correct, error message. Scylla tries to add to it adds to it "at ..." in error_collector.hh, but it apparently adds this garbage "<missing" thing instead. I don't know if it's a simple bug or the C++ antlr simply doesn't have access to this information and shouldn't even try....

@nyh
Copy link
Contributor Author

nyh commented Apr 30, 2023

By the way, if you're curious where our string <missing comes from, it's a simple Antlr3 bug, I opened an issue here: antlr/antlr3#218. But in any case, even if this token was correct, we shouldn't print it anyway - it's the missing token again, not the seen token.

nyh added a commit to nyh/scylla that referenced this issue Apr 30, 2023
We have known for a long time (see issue scylladb#1703) that the quality of our
CQL "syntax error" messages leave a lot to be desired, especially when
compared to Cassandra. This patch doesn't yet bring us great error
messages with great context - doing this isn't easy and it appears that
Antlr3's C++ runtime isn't as good as the Java one in this regard -
but this patch at least fixes **garbage** printed in some error messages.

Specifically, when the parser can deduce that a specific token is missing,
it used to print

    line 1:83 missing ')' at '<missing '

After this patch we get rid of the meaningless string '<missing ':

    line 1:83 : Missing ')'

Also, when the parser deduced that a specific token was unneeded, it
used to print:

    line 1:83 extraneous input ')' expecting <invalid>

Now we got rid of this silly "<invalid>" and write just:

    line 1:83 : Unexpected ')'

Refs scylladb#1703. I didn't yet marked that issue "fixed" because I think a
complete fix would also require printing the entire misparsed line and the
point of the parse failure. Scylla still prints a generic "Syntax Error"
in most cases now, and although the character number (83 in the above
example) can help, it's much more useful to see the actual failed
statement and where character 83 is.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
nyh added a commit to nyh/scylla that referenced this issue Apr 30, 2023
We have known for a long time (see issue scylladb#1703) that the quality of our
CQL "syntax error" messages leave a lot to be desired, especially when
compared to Cassandra. This patch doesn't yet bring us great error
messages with great context - doing this isn't easy and it appears that
Antlr3's C++ runtime isn't as good as the Java one in this regard -
but this patch at least fixes **garbage** printed in some error messages.

Specifically, when the parser can deduce that a specific token is missing,
it used to print

    line 1:83 missing ')' at '<missing '

After this patch we get rid of the meaningless string '<missing ':

    line 1:83 : Missing ')'

Also, when the parser deduced that a specific token was unneeded, it
used to print:

    line 1:83 extraneous input ')' expecting <invalid>

Now we got rid of this silly "<invalid>" and write just:

    line 1:83 : Unexpected ')'

Refs scylladb#1703. I didn't yet marked that issue "fixed" because I think a
complete fix would also require printing the entire misparsed line and the
point of the parse failure. Scylla still prints a generic "Syntax Error"
in most cases now, and although the character number (83 in the above
example) can help, it's much more useful to see the actual failed
statement and where character 83 is.

Unfortunately some tests enshrine buggy error messages and had to be
fixed. Other tests enshrined strange text for a generic unexplained
error message, which used to say "  : syntax error..." (note the two
spaces and elipses) and after this patch is " : Syntax error". So
these tests are changed. Another message, "no viable alternative at
input" is deliberately kept unchanged by this patch so as not to break
many more tests which enshrined this message.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
denesb pushed a commit that referenced this issue May 2, 2023
We have known for a long time (see issue #1703) that the quality of our
CQL "syntax error" messages leave a lot to be desired, especially when
compared to Cassandra. This patch doesn't yet bring us great error
messages with great context - doing this isn't easy and it appears that
Antlr3's C++ runtime isn't as good as the Java one in this regard -
but this patch at least fixes **garbage** printed in some error messages.

Specifically, when the parser can deduce that a specific token is missing,
it used to print

    line 1:83 missing ')' at '<missing '

After this patch we get rid of the meaningless string '<missing ':

    line 1:83 : Missing ')'

Also, when the parser deduced that a specific token was unneeded, it
used to print:

    line 1:83 extraneous input ')' expecting <invalid>

Now we got rid of this silly "<invalid>" and write just:

    line 1:83 : Unexpected ')'

Refs #1703. I didn't yet marked that issue "fixed" because I think a
complete fix would also require printing the entire misparsed line and the
point of the parse failure. Scylla still prints a generic "Syntax Error"
in most cases now, and although the character number (83 in the above
example) can help, it's much more useful to see the actual failed
statement and where character 83 is.

Unfortunately some tests enshrine buggy error messages and had to be
fixed. Other tests enshrined strange text for a generic unexplained
error message, which used to say "  : syntax error..." (note the two
spaces and elipses) and after this patch is " : Syntax error". So
these tests are changed. Another message, "no viable alternative at
input" is deliberately kept unchanged by this patch so as not to break
many more tests which enshrined this message.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13731
@nyh
Copy link
Contributor Author

nyh commented May 2, 2023

In PR #13731, I fixed all the ugly at '<missing ' and expecting <invalid> strings that appeared in many of the "ugly error messages" shown above.

I think the only thing remaining to do before this issue can be finally closed is to copy Cassandra's idea of printing not just the line and character number where the error occurred - but also show the full line (or snippet of the line) with that specific position emphasized. I don't think we necessarily need to use the same format that Cassandra used for this snippet - it's not particularly pretty and other compilers do it nicer.

@mykaul
Copy link
Contributor

mykaul commented Jun 20, 2023

How far off are we with antlr4 conversion?

@nyh
Copy link
Contributor Author

nyh commented Jun 20, 2023

How far off are we with antlr4 conversion?

I don't think anybody ever started. Also, I don't think anyone has any idea of what kind of benefits it will give us, and specifically whether this conversion will give us better error messages.
It's probably not necessary for better error messages (Cassandra also uses Antlr3, not Antlr4, but the Java version of Antlr3 is a bit better - but I believe I can still improve the error messages even with Antlr3 with C++).

@mykaul
Copy link
Contributor

mykaul commented Jun 20, 2023

How far off are we with antlr4 conversion?

I don't think anybody ever started. Also, I don't think anyone has any idea of what kind of benefits it will give us, and specifically whether this conversion will give us better error messages. It's probably not necessary for better error messages (Cassandra also uses Antlr3, not Antlr4, but the Java version of Antlr3 is a bit better - but I believe I can still improve the error messages even with Antlr3 with C++).

No particular reason, apart from that it looks to being actively developed, I saw antlr/antlr4#4237 , etc. I did not look into the differences in depth.

@avelanarius
Copy link
Member

Another example from a user: due to a bug in Java Driver, it was generating a double negation in counter updates (essentially counter = counter - (-13)):

UPDATE tabletest.tablename SET counter=counter--13 WHERE key='key' AND member='member'

The error message returned by Scylla was unhelpful for the user to find the bug:

com.datastax.oss.driver.api.core.servererrors.SyntaxError: line 1:157  : unexpected input...

The user said: "the syntax error log seems to be unkind"

@denesb
Copy link
Contributor

denesb commented Jul 19, 2023

The full error message is:

Error message : 
com.datastax.oss.driver.api.core.servererrors.SyntaxError: line 1:157  : unexpected input...
  expected one of : Actually dude, we didn't seem to be expecting anything here, or at least
I could not work out what I was expecting, like so many of us these days!

No wonder the user thought it is "unkind".

@nyh
Copy link
Contributor Author

nyh commented Jul 19, 2023

@avelanarius I tried update tbl set counter=counter--13 where k=3, Cassandra said:

line 1:44 mismatched character \'<EOF>\' expecting set nul

Which is also completely unhelpful (even when I know where the error is, I don't know what the message means).

The error message from today's Scylla (after 57ffbcb) is:

line 1:44 : Unexpected '

I don't know why it only has one quote character, and missing the "EOF" token that the Cassandra version generated. We'll need to fix that.

But the problem here is funny (or sad, depending on how you look at it) - in CQL "--" (two minus signs) can be used to start a comment. I don't know why and who decided it but it documented https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlRefComment.html :-( So everything after the "--" was a comment, so the expression indeed ended prematurely, and the "unexpected EOF" was the right message. But "unclosed comment" message would have been much, much, more helpful.

@avikivity
Copy link
Member

-- is a SQL comment.

@fee-mendes
Copy link
Member

The full error message is:

Error message : 
com.datastax.oss.driver.api.core.servererrors.SyntaxError: line 1:157  : unexpected input...
  expected one of : Actually dude, we didn't seem to be expecting anything here, or at least
I could not work out what I was expecting, like so many of us these days!

No wonder the user thought it is "unkind".

This also happens when you provide an invalid UUID :

cqlsh:test> CREATE TABLE id (id uuid PRIMARY KEY);
cqlsh:test> SELECT * FROM id WHERE id = 00000000-0000-0000-0000-0000-000000000000 ;
SyntaxException: line 1:56  : unexpected input...
  expected one of : Actually dude, we didn't seem to be expecting anything here, or at least
I could not work out what I was expecting, like so many of us these days!

cqlsh:test> 

Normally I would open a separate issue, but it seems like we are grouping all weird CQL syntax errors here?

@fee-mendes
Copy link
Member

And a single dash, with a space... that's likely it.

UPDATE x SET y = - 1 WHERE pk=0;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cql symptom/ux Concerns regarding the user experience in working with Scylla. type/bug user request
Projects
None yet
Development

No branches or pull requests

9 participants