Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: change from quoting from backtick to quote #108395

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

costin
Copy link
Member

@costin costin commented May 8, 2024

For historical reasons, the source declaration inside FROM command is
treated as an identifier, using backticks (``) for escaping the value.
This is inconsistent since the source is not an identifier (field name)
but an index name which has different semantics.
index means a field name index while "index" means a literal with
said value.

In case of FROM, the index name/location is more like a literal (also in
unquoted form) than an identifier (that is a reference to a value).

This PR tweaks the grammar and plugs in the quoted string logic so that
both the single quote (") and triple quote (""") are allowed.

For historical reasons, the source declaration inside FROM command is
 treated as an identifier, using backticks (`) for escaping the value.
This is inconsistent since the source is not an identifier (field name)
 but an index name which has different semantics.
 `index` means a field name index while "index" means a literal with
 said value.

In case of FROM, the index name/location is more like a literal (also in
 unquoted form) than an identifier (that is a reference to a value).

This PR tweaks the grammar and plugs in the quoted string logic so that
 both the single quote (") and triple quote (""") are allowed.
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 8, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @costin, I've created a changelog YAML for you.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get to fully review the PR yet, but I noticed it is a breaking language change. We need to use our language version mechanism to avoid breaking existing queries.

assertIdentifierAsIndexPattern("foo,test,xyz", "from foo, test,xyz");
assertIdentifierAsIndexPattern(
"<logstash-{now/M{yyyy.MM}}>,<logstash-{now/d{yyyy.MM.dd|+12:00}}>",
"from <logstash-{now/M{yyyy.MM}}>, `<logstash-{now/d{yyyy.MM.dd|+12:00}}>`"
"from <logstash-{now/M{yyyy.MM}}>, \"<logstash-{now/d{yyyy.MM.dd|+12:00}}>\""
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised we don't need to update any csv tests - I guess we should add one or two to ensure this change also works correctly end-to-end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And use these csv tests in the docs for FROM; currently, I think they contain incorrect syntax (missing backticks/quotes):
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd also be great to add some tests with backticks now.

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4 Outdated Show resolved Hide resolved
@alex-spies alex-spies self-requested a review May 8, 2024 08:40
@alex-spies
Copy link
Contributor

Also, we'll have to update the docs for FROM.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing I noticed is that for

FROM `testidx`

we include the backticks in the parsed index pattern/name. I think backticks should not be allowed here.

@alex-spies alex-spies self-requested a review May 8, 2024 10:59
@bpintea
Copy link
Contributor

bpintea commented May 8, 2024

we include the backticks in the parsed index pattern/name. I think backticks should not be allowed here.

I think they should be allowed (maybe surprisingly).
(Also, PUT "my_index" fails with: "reason": """Invalid index name ["my_index"], must not contain the following characters ['<','*','?','>','|',',','/','\',' ','"']""")

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand, this feels good since quotes aren't allowed in index names, so it feels natural to use them for quoting. OTOH, index names feel closer to identifiers than string literals and conceptually, would be easier to understand them together, as a language user.

It'd be good to have more tests with backticks now, as well as csv-specs, as noted by Alex.

It'd also be good to get it in 8.14, if we can sync this with other clients (like Kibana).

assertIdentifierAsIndexPattern("foo,test,xyz", "from foo, test,xyz");
assertIdentifierAsIndexPattern(
"<logstash-{now/M{yyyy.MM}}>,<logstash-{now/d{yyyy.MM.dd|+12:00}}>",
"from <logstash-{now/M{yyyy.MM}}>, `<logstash-{now/d{yyyy.MM.dd|+12:00}}>`"
"from <logstash-{now/M{yyyy.MM}}>, \"<logstash-{now/d{yyyy.MM.dd|+12:00}}>\""
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd also be great to add some tests with backticks now.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to have these as strings/constants (vs. identifiers). Until now, the backticks were more like a way of emphasizing that the index names are not regular "strings", but based on your explanation, they in essence are and I agree.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since quoting was disabled in this PR, this PR does not break bwc anymore, so my change request is moot :)

@costin costin removed the v8.14.1 label Jun 4, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

@costin
Copy link
Member Author

costin commented Jun 4, 2024

I've added the target for this PR to be 8.15 (no backport to 8.14 necessary).
Since quoting (with backticks, in the old form) was removed in 8.14, this PR is just about adding regular quotes to indices for 8.15 and there's no backwards compatibility to worry about.

@costin costin force-pushed the esql/from-source-as-string-instead-of-identifier branch from c3dd636 to 9d97583 Compare June 4, 2024 21:37
@costin
Copy link
Member Author

costin commented Jun 4, 2024

@astefan @bpintea @alex-spies I've merged main and updated the PR, please take a look. /cc @drewdaemon

Comment on lines -42 to -43
INDEX_UNQUOTED_IDENTIFIER
: INDEX_UNQUOTED_IDENTIFIER_PART+
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an identifier rather a string - hence the rename.

: FROM indexIdentifier (COMMA indexIdentifier)* metadata?
: FROM indexString (COMMA indexString)* metadata?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose indexString instead of indexSource since we use source (to declare source commands).

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but maybe we could add a couple more tests.

I think this should be followed up by updating the docs for FROM; esp. they should use examples from csv tests (currently hard-coded, manual examples) and document + illustrate the use of quotes.
Since we also generalize the quoting for LOOKUP and METRICS, we should probably add some tests to that, too.

""");
assertStringAsIndexPattern("foo,test,xyz", "from foo, test,xyz");
assertStringAsIndexPattern(
"<logstash-{now/M{yyyy.MM}}>,<logstash-{now/d{yyyy.MM.dd|+12:00}}>",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions for additional test cases:

  • Maybe let's also add a simple test case without quotes but with colons : to illustrate that CCQ is fine.
from logs-*,*:logs-*
  • We could also add a test case with an index name containing and/or starting with a dot ., especially without quotes.
  • What happens when quotes are not properly closed, or opened? We could add a small randomized test for "FROM " + leftQuote + "some-idx" + rightQuote, where the left and right quotes are a random, mismatched selection from ", """ and the empty string (unquoted).
  • Other pathological cases:
    • what if a quoted string contains an un-escaped quote in the middle? E.g. FROM """someidx"""name""" or FROM "someidx"name".
    • What if two quoted strings are next to each other? FROM "someidx""name"

// in 8.14 ` were not allowed
// this has been relaxed in 8.15 since " is used for quoting
fragment UNQUOTED_SOURCE_PART
: ~["=|,[\]/ \t\r\n]
| '/' ~[*/] // allow single / but not followed by another / or * which would start a comment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that this is maybe a bit too permissive. For instance, adding the following to the statement parser tests passes just fine:

assertStringAsIndexPattern("foo/=", "from foo/=");

This example also works for the other symbols that are forbidden in line 40: prepending / allows all of them except /, in particular even ".

@stratoula
Copy link

@costin I have a question. In 8.14 we removed the backticks support. What does it mean? That some indices were not working and with the current PR we are fixing them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants