-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: change from quoting from backtick to quote #108395
base: main
Are you sure you want to change the base?
ESQL: change from quoting from backtick to quote #108395
Conversation
For historical reasons, the source declaration inside FROM command is treated as an identifier, using backticks (`) for escaping the value. This is inconsistent since the source is not an identifier (field name) but an index name which has different semantics. `index` means a field name index while "index" means a literal with said value. In case of FROM, the index name/location is more like a literal (also in unquoted form) than an identifier (that is a reference to a value). This PR tweaks the grammar and plugs in the quoted string logic so that both the single quote (") and triple quote (""") are allowed.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @costin, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get to fully review the PR yet, but I noticed it is a breaking language change. We need to use our language version mechanism to avoid breaking existing queries.
assertIdentifierAsIndexPattern("foo,test,xyz", "from foo, test,xyz"); | ||
assertIdentifierAsIndexPattern( | ||
"<logstash-{now/M{yyyy.MM}}>,<logstash-{now/d{yyyy.MM.dd|+12:00}}>", | ||
"from <logstash-{now/M{yyyy.MM}}>, `<logstash-{now/d{yyyy.MM.dd|+12:00}}>`" | ||
"from <logstash-{now/M{yyyy.MM}}>, \"<logstash-{now/d{yyyy.MM.dd|+12:00}}>\"" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised we don't need to update any csv tests - I guess we should add one or two to ensure this change also works correctly end-to-end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd also be great to add some tests with backticks now.
Also, we'll have to update the docs for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing I noticed is that for
FROM `testidx`
we include the backticks in the parsed index pattern/name. I think backticks should not be allowed here.
I think they should be allowed (maybe surprisingly). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On one hand, this feels good since quotes aren't allowed in index names, so it feels natural to use them for quoting. OTOH, index names feel closer to identifiers than string literals and conceptually, would be easier to understand them together, as a language user.
It'd be good to have more tests with backticks now, as well as csv-specs, as noted by Alex.
It'd also be good to get it in 8.14, if we can sync this with other clients (like Kibana).
assertIdentifierAsIndexPattern("foo,test,xyz", "from foo, test,xyz"); | ||
assertIdentifierAsIndexPattern( | ||
"<logstash-{now/M{yyyy.MM}}>,<logstash-{now/d{yyyy.MM.dd|+12:00}}>", | ||
"from <logstash-{now/M{yyyy.MM}}>, `<logstash-{now/d{yyyy.MM.dd|+12:00}}>`" | ||
"from <logstash-{now/M{yyyy.MM}}>, \"<logstash-{now/d{yyyy.MM.dd|+12:00}}>\"" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd also be great to add some tests with backticks now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to have these as strings/constants (vs. identifiers). Until now, the backticks were more like a way of emphasizing that the index names are not regular "strings", but based on your explanation, they in essence are and I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since quoting was disabled in this PR, this PR does not break bwc anymore, so my change request is moot :)
Pinging @elastic/kibana-esql (ES|QL-ui) |
I've added the target for this PR to be 8.15 (no backport to 8.14 necessary). |
c3dd636
to
9d97583
Compare
@astefan @bpintea @alex-spies I've merged main and updated the PR, please take a look. /cc @drewdaemon |
INDEX_UNQUOTED_IDENTIFIER | ||
: INDEX_UNQUOTED_IDENTIFIER_PART+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an identifier rather a string - hence the rename.
: FROM indexIdentifier (COMMA indexIdentifier)* metadata? | ||
: FROM indexString (COMMA indexString)* metadata? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose indexString instead of indexSource since we use source (to declare source commands).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but maybe we could add a couple more tests.
I think this should be followed up by updating the docs for FROM
; esp. they should use examples from csv tests (currently hard-coded, manual examples) and document + illustrate the use of quotes.
Since we also generalize the quoting for LOOKUP
and METRICS
, we should probably add some tests to that, too.
"""); | ||
assertStringAsIndexPattern("foo,test,xyz", "from foo, test,xyz"); | ||
assertStringAsIndexPattern( | ||
"<logstash-{now/M{yyyy.MM}}>,<logstash-{now/d{yyyy.MM.dd|+12:00}}>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions for additional test cases:
- Maybe let's also add a simple test case without quotes but with colons
:
to illustrate that CCQ is fine.
from logs-*,*:logs-*
- We could also add a test case with an index name containing and/or starting with a dot
.
, especially without quotes. - What happens when quotes are not properly closed, or opened? We could add a small randomized test for
"FROM " + leftQuote + "some-idx" + rightQuote
, where the left and right quotes are a random, mismatched selection from", """
and the empty string (unquoted). - Other pathological cases:
- what if a quoted string contains an un-escaped quote in the middle? E.g.
FROM """someidx"""name"""
orFROM "someidx"name"
. - What if two quoted strings are next to each other?
FROM "someidx""name"
- what if a quoted string contains an un-escaped quote in the middle? E.g.
// in 8.14 ` were not allowed | ||
// this has been relaxed in 8.15 since " is used for quoting | ||
fragment UNQUOTED_SOURCE_PART | ||
: ~["=|,[\]/ \t\r\n] | ||
| '/' ~[*/] // allow single / but not followed by another / or * which would start a comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that this is maybe a bit too permissive. For instance, adding the following to the statement parser tests passes just fine:
assertStringAsIndexPattern("foo/=", "from foo/=");
This example also works for the other symbols that are forbidden in line 40: prepending /
allows all of them except /
, in particular even "
.
@costin I have a question. In 8.14 we removed the backticks support. What does it mean? That some indices were not working and with the current PR we are fixing them? |
For historical reasons, the source declaration inside FROM command is
treated as an identifier, using backticks (``) for escaping the value.
This is inconsistent since the source is not an identifier (field name)
but an index name which has different semantics.
index
means a field name index while "index" means a literal withsaid value.
In case of FROM, the index name/location is more like a literal (also in
unquoted form) than an identifier (that is a reference to a value).
This PR tweaks the grammar and plugs in the quoted string logic so that
both the single quote (") and triple quote (""") are allowed.