Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Add OPTIONS clause to FROM command #106636

Merged
merged 11 commits into from
Mar 28, 2024
Merged

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Mar 21, 2024

This adds an OPTIONS clause to FROM command, allowing to specify search or index resolution options.
For now, the following options are supported:

  • preference
  • allow_no_indices
  • ignore_unavailable.

The OPTIONS clause take a comma-separated list of quoted AVPs. Example:

FROM index OPTIONS "preference"="_shards:1,2","allow_no_indices"="true"

This adds an OPTIONS clause to FROM, allowing to specify search or index
resolution options, such as: preference, allow_no_indices or
ignore_unavailable.
@elasticsearchmachine
Copy link
Collaborator

Hi @bpintea, I've created a changelog YAML for you.

requestFilter,
concreteIndices,
originalIndices,
esSourceOptions,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only added this parameter, diff grew because of breaking down of arguments line, due to line limit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about CCQ? Does it support having these preferences sent over?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they're part of the EsRelation, which is part of the Fragment which is sent across. I've added some REST tests and CSV test that's executed part of CCQ.

@bpintea bpintea marked this pull request as ready for review March 22, 2024 17:42
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 22, 2024
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, thanks @bpintea !

Added a couple of remarks, many of which are minor. The major remarks are (summarized):

  • test coverage for the preferences options
  • docs update
  • wondering whether the right hand side of "some_option"="some_value" should be in quotes, even if it is expected to be e.g. a boolean
  • wondering whether we need to register warnings in case of e.g. a shard preference for a non-existent shard @astefan chimed in and said this is consistent with what ES does in case of _search, so please disregard.

@@ -99,7 +99,20 @@ field
;

fromCommand
: FROM fromIdentifier (COMMA fromIdentifier)* metadata?
: FROM fromIdentifier (COMMA fromIdentifier)* fromOptions? metadata?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think we need to update the docs for FROM as well to reflect the grammar change.
  2. I think we should mention in the docs that FROM can use backticks, e.g. in the edge case that an index is called options or metadata - but that's maybe out of scope for this PR.
FROM `metadata`,`options` ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the docs will follow in separate PR -- I wanted to ensure we want to keep the format.

}

static void writeEsRelation(PlanStreamOutput out, EsRelation relation) throws IOException {
assert relation.children().size() == 0;
out.writeNoSource();
writeEsIndex(out, relation.index());
writeAttributes(out, relation.output());
if (out.getTransportVersion().onOrAfter(TransportVersions.ESQL_ES_SOURCE_OPTIONS)) {
if (relation.esSourceOptions() != null) {
out.writeBoolean(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is marking the existence of the source option with a boolean really necessary?
We encode the options as 3 optional strings; that handles the optionality already, why repeat this logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is marking the existence of the source option with a boolean really necessary? We encode the options as 3 optional strings

Without this boolean, how would the remote side know if it should attempt to read three (optional) strings or not? Reading an optional strings involves consuming at least a boolean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant to say: StreamOutput.writeOptionalString already prepends a boolean to say "there's a string!" or "there's no string!". We're doubling this logic.

The serialization code in this PR, however, optimizes for the (common?) case where there's no options. My point is that this is unnecessary, because we'd only be sending 1 byte (false) instead of 3 bytes (false, false, false), at the cost of slightly more complex code (additional null-handling at the level of EsSourceOptions).

I don't feel strongly about this, but I'd prefer if

  • the EsSourceOptions were not nullable (each of the options that are encoded in them is nullable already, so there's two ways to express that no options were set)
  • and if the serialization always sent the EsSourceOptions as 3 optional strings, rather than specialcasing for when the EsSourceOptions is null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, you were still addressing the nullability of the EsSourceOptions instance. Ok, took on your suggestion and applied it.

;

configOption
: string ASSIGN string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it's a bit unfortunate that this leads to quotes around e.g. booleans. In particular, one has to write ... OPTIONS "allow_no_indices"="true" instead of "allow_no_indices"=true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, ideally one would simply write smth like OPTIONS allow_no_indices=true.
But I think the rational was that (1) it might require explaining when to use value quotes and when not (not that difficult, but would come with some load) and (2) that we can relax the quoting later, if needed.

assertEquals(expectedTextBody("txt", count, null), runEsqlAsTextWithFormat(builder.apply(null), "txt", null));

// returns nothing (0 for count), given the non-existing shard as preference
String option = "\"preference\"=\"_shards:666\"";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we add more tests that demonstrate that the other possible preference options work correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We treat the values in a opaque way and this tests that these "makes it through". Testing it exhaustively involves multiple nodes, which seemed an overkill just for this feature (and I assume there are already other tests for all these values).

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review in progress. Found that FROM employees OPTIONS "preference"="" (empty string) returns org.elasticsearch.ElasticsearchException$1: Index 0 out of bounds for length 0


public void addOption(String name, String value) {
switch (name) {
case "allow_no_indices" -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could take the name of this option, at least, from IndicesOptions.WildcardOptions.ALLOW_NO_INDICES and use this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, fixed.
(I started from "preference", which is used as hardcoded string everywhere, so didn't explore further.)

}
preference = value;
}
default -> throw new IllegalArgumentException("unknown option named [" + name + "]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about offering some suggestions? StringUtils.findSimilar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, added.

public void testFromOptionsInvalidIndicesOptionValue() {
expectError(
FROM + " options \"allow_no_indices\"=\"foo\"",
"line 1:20: invalid options provided: Could not convert [allow_no_indices] to boolean"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we can be more strict. From my tests, we only allow true and false (not even uppercase versions of them). So, unless we offer more options (like, t/f, 1/0, TRUE/FALSE) and I believe we shouldn't and we must follow the same behavior ES does, we can improve this message (and potentially use the same ES generates). ES itself provides more info:

    "error": {
        "root_cause": [
            {
                "type": "illegal_argument_exception",
                "reason": "Could not convert [allow_no_indices] to boolean"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "Could not convert [allow_no_indices] to boolean",
        "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Failed to parse value [TRUE] as only [true] or [false] are allowed."
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
The parsing of the options is delegated to IndicesOptions, so we follow those rules. But forgot to pipe the cause through, it's now fixed. And I also incorporated your later observation, though IMO this unpacking of the causes should be left to the client (but maybe here it's OK to do it).

requestFilter,
concreteIndices,
originalIndices,
esSourceOptions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about CCQ? Does it support having these preferences sent over?

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I've added one suggestion and, also, I would be more comfortable if there would be a csv-spec test, as well. And here I am more thinking about the mixed-cluster and multi-clusters IT tests (the csv tests are used there as well; with EsqlSpecTestCase base class). RestEsqlTestCase (the one you have in PR) is for single-node environment only.

@@ -220,7 +220,7 @@ public LogicalPlan visitFromCommand(EsqlBaseParser.FromCommandContext ctx) {
try {
esSourceOptions.addOption(name, value);
} catch (IllegalArgumentException iae) {
throw new ParsingException(source(nameContext), "invalid options provided: " + iae.getMessage());
throw new ParsingException(iae, source(nameContext), "invalid options provided: " + iae.getMessage());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant with my previous suggestion was something like this:

Suggested change
throw new ParsingException(iae, source(nameContext), "invalid options provided: " + iae.getMessage());
var cause = iae.getCause() != null ? "; " + iae.getCause().getMessage() : "";
throw new ParsingException(iae, source(nameContext), "invalid options provided: " + iae.getMessage() + cause);

to have an error message like "reason": "line 1:56: invalid options provided: Could not convert [allow_no_indices] to boolean; Failed to parse value [TRUE] as only [true] or [false] are allowed.",

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left only minor remarks. Thanks @bpintea !

@@ -186,8 +186,10 @@ FROM_OPENING_BRACKET : OPENING_BRACKET -> type(OPENING_BRACKET);
FROM_CLOSING_BRACKET : CLOSING_BRACKET -> type(CLOSING_BRACKET);
FROM_COMMA : COMMA -> type(COMMA);
FROM_ASSIGN : ASSIGN -> type(ASSIGN);
FROM_QUTED_STRING : QUOTED_STRING -> type(QUOTED_STRING);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FROM_QUTED_STRING : QUOTED_STRING -> type(QUOTED_STRING);
FROM_QUOTED_STRING : QUOTED_STRING -> type(QUOTED_STRING);

@@ -99,7 +99,20 @@ field
;

fromCommand
: FROM fromIdentifier (COMMA fromIdentifier)* metadata?
: FROM fromIdentifier (COMMA fromIdentifier)* fromOptions? metadata?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized, it will not be possible to write FROM index METADATA ... OPTIONS ... because we only allow the fixed order FROM->OPTIONS->METADATA.

Probably okay for now, but we might want to generalize that later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had that thought too (it's just a grammar line more). But had a look at SQL (which doesn't allow this swapping where it could) and thought of how we now request quotes (i.e. no flexibility) and thought I'd keep it inline.

Copy link
Member

@costin costin Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this - I'm fine with the order being fixed however I would keep the metadata? first, e.g.:
FROM index METADATA ... OPTIONS... since in general I expect OPTIONS to be the last item in a command mainly because there can be a long list of parameters and because METADATA and others could be considered specialized OPTIONS and thus should take precedence.

Comment on lines 548 to 549
convertFromDatetimeWithOptions#[skip:-8.13.99, reason:OPTIONS added in 8.14]
FROM employees OPTIONS "allow_no_indices"="false","preference"="_shards:0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if floats.csv-spec is the best place for this; we're adding the test to test the OPTIONS, no? Maybe it's time for a from.csv-spec? This could be consolidated with id.csv-spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, thanks.

List<String> matches = StringUtils.findSimilar(name, List.of(ALLOW_NO_INDICES, IGNORE_UNAVAILABLE, OPTION_PREFERENCE));
if (matches.isEmpty() == false) {
String suggestions = matches.size() == 1 ? "[" + matches.get(0) + "]" : "any of " + matches;
message += ", did you mean " + suggestions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message += ", did you mean " + suggestions;
message += ", did you mean " + suggestions + "?";

@bpintea bpintea merged commit aeeb597 into elastic:main Mar 28, 2024
14 checks passed
@bpintea bpintea deleted the esql/from_options branch March 28, 2024 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants