New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Introduce mode setting for ENRICH #103949
Conversation
Extend grammar to allow setting a mode for ENRICH to indicate the policy resolution when running against remote clusters/CCQ.
Pinging @elastic/es-analytics-geo (Team:Analytics) |
new Literal(EMPTY, "countries", KEYWORD), | ||
new UnresolvedAttribute(EMPTY, "country_code"), | ||
null, | ||
List.of() | ||
), | ||
processingCommand("enrich countries ON country_code") | ||
processingCommand("enrich [mode=" + mode.name() + "] countries ON country_code") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really happy with the naming. Mode is fairly abstract - I would have preferred resolution, resolved, policy-location but it doesn't fit well with any
, relative
or originator
.
Naming is hard. /cc @nik9000 for more suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment around BWC, but this looks great. Thanks Costin!
@@ -246,9 +246,13 @@ showCommand | |||
; | |||
|
|||
enrichCommand | |||
: ENRICH policyName=fromIdentifier (ON matchField=qualifiedNamePattern)? (WITH enrichWithClause (COMMA enrichWithClause)*)? | |||
: ENRICH (enrichMode=enrichPolicyMode)? policyName=fromIdentifier (ON matchField=qualifiedNamePattern)? (WITH enrichWithClause (COMMA enrichWithClause)*)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the mode
be after the policy name? I am fine with either option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the first option I went for however I wanted to be consistent with FROM namely have the options for a command specified after its declaration not per parameter, e.g.:
ENRICH policy[mode=any] // it's clear mode affects the policy however is fragile as it could be interpreted as being
// part of the policy name. Right now the index doesn't include[] but
// if this gets changed, the declaration will break
ENRICH "policy[with brackets]"[mode=any] // not as pretty
ENRICH [mode= any] "policyWithBracket[]" // longer but safer to evolve
Another idea is to namespace the parameter to indicate what this mode is about since not everyone would be using CCQ and use something like ccq.mode, ccq.policy, ccq.policy.mode, ccq.policy.resolution:
ENRICH [ccq.policy.mode=any] policyName ON field WITH ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining. I am fine with either option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENRICH policy
ENRICH FROM ANY policy
ENRICH FROM COORDINATING policy
ENRICH FROM REMOTE policy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like @nik9000's suggestion and I am kind of torn between the two approaches:
- one is consistent with
FROM index [metadata _source]
approach where we put stuff between square brackets in the form of settings, or additional instructions. Whenever we have a similar setting in the future for commands we should probably follow the same style. ENRICH
is kind of special because the command itself is a series of natural language words put together to form a sentence. For exampleENRICH languages_policy ON language WITH name = language_name
would become:
ENRICH FROM COORDINATING languages_policy ON language WITH name = language_name
the drawback, though, is that if we add any additional elements to ENRICH in the future we need to do it the same natural language way, whereas with a square brackets approach the mechanism allows unlimited extensibility (just a made up setting name): [policy.mode=coordinating, node.affinity=hot]
Now, that I listed both options, I tend to favor the consistent square brackets option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I'd also find it more fluent keeping the consistency with the ENRICH
command itself (rather than FROM
command) and add a FROM
clause after the policy name.
OTOH, FROM
command is an outlier to receive parameters in square brackets, so these one here would bring some balance (but I'd still like a query language with fewer [
, ]
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FROM flows better however it's not a full property of ENRICH since it only makes sense when running a query in a CCQ environment against a CCQ index.
- FROM would have to apply only to the policy (which is not an index)
- without a remote index, command:
ENRICH FROM COORDINATING language
is not only meaningless but incorrect - we're hoping the setting is an advance one - namely by default folks would use the
any
mode.
For these reasons I think we should give it less importance than a regular property.
Essentially this introduces a slightly separate mode:
FROM [medata _index, _source]
ENRICH [ccq.mode=any]
metadata does NOT use =
and has a list of values
ccq.mode uses =
and has only one value.
We should align the two so either:
FROM [metadata=_index,_source]
OR ENRICH [ccq.mode any]
In both cases, we need to figure out how to handle enumeration of other options that might be added in the future.
A quick proposal is to add another square bracket
ENRICH [ccq.mode any] [another.param value], etc...
A bit more verbose but easier to navigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without a remote index, command: ENRICH FROM COORDINATING language is not only meaningless but incorrect
I guess ENRICH [ccq.mode=remote]
would still err if run locally. However, if we want this to be an "advanced" setting, it might make sense to use [
, ]
then.
OR ENRICH [ccq.mode any]
👍
ENRICH [ccq.mode any] [another.param value], etc...
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess ENRICH [ccq.mode=remote] would still err if run locally. However, if we want this to be an "advanced" setting, it might make sense to use [, ] then.
We haven't decided yet if we want to lenient or not:
- declaring any ccq property without a ccq index could be considered an error to force the user to remove the setting
- instead of throwing an error, we could just throw a warning and ignore the setting. this way the same setting can be
used across difference environments; the downside though is the user might end up ignoring the warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three options should work whether remote indices are present or not.
ANY
means that enrich takes place on any cluster.COORDINATING
means that enrich takes place on the coordinating cluster receiving an ESQL.REMOTE
means that enrich takes place on the cluster hosting the target index.
For local indices, all options have the same semantics. These options affect only the remote indices.
@@ -727,6 +728,7 @@ static Enrich readEnrich(PlanStreamInput in) throws IOException { | |||
static void writeEnrich(PlanStreamOutput out, Enrich enrich) throws IOException { | |||
out.writeNoSource(); | |||
out.writeLogicalPlanNode(enrich.child()); | |||
out.writeEnum(enrich.mode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to handle BWC here.
diff --git a/server/src/main/java/org/elasticsearch/TransportVersions.java b/server/src/main/java/org/elasticsearch/TransportVersions.java
index ce1f6d9ecaad..7d7adc4156de 100644
--- a/server/src/main/java/org/elasticsearch/TransportVersions.java
+++ b/server/src/main/java/org/elasticsearch/TransportVersions.java
@@ -178,6 +178,7 @@ public class TransportVersions {
public static final TransportVersion SNAPSHOTS_IN_PROGRESS_TRACKING_REMOVING_NODES_ADDED = def(8_566_00_0);
public static final TransportVersion SMALLER_RELOAD_SECURE_SETTINGS_REQUEST = def(8_567_00_0);
public static final TransportVersion UPDATE_API_KEY_EXPIRATION_TIME_ADDED = def(8_568_00_0);
+ public static final TransportVersion ESQL_ENRICH_MODE = def(8_569_00_0);
/*
* STOP! READ THIS FIRST! No, really,
diff --git a/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/io/stream/PlanNamedTypes.java b/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/io/stream/PlanNamedTypes.java
index 7e81207ac919..e8c427870190 100644
--- a/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/io/stream/PlanNamedTypes.java
+++ b/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/io/stream/PlanNamedTypes.java
@@ -7,6 +7,7 @@
package org.elasticsearch.xpack.esql.io.stream;
+import org.elasticsearch.TransportVersions;
import org.elasticsearch.common.TriFunction;
import org.elasticsearch.common.io.stream.NamedWriteable;
import org.elasticsearch.common.io.stream.StreamInput;
@@ -717,7 +718,7 @@ public final class PlanNamedTypes {
return new Enrich(
in.readSource(),
in.readLogicalPlanNode(),
- in.readEnum(Enrich.Mode.class),
+ in.getTransportVersion().onOrAfter(TransportVersions.ESQL_ENRICH_MODE) ? in.readEnum(Enrich.Mode.class) : Enrich.Mode.ANY,
in.readExpression(),
in.readNamedExpression(),
new EnrichPolicyResolution(in.readString(), new EnrichPolicy(in), IndexResolution.valid(readEsIndex(in))),
@@ -728,7 +729,11 @@ public final class PlanNamedTypes {
static void writeEnrich(PlanStreamOutput out, Enrich enrich) throws IOException {
out.writeNoSource();
out.writeLogicalPlanNode(enrich.child());
- out.writeEnum(enrich.mode());
+ if (out.getTransportVersion().onOrAfter(TransportVersions.ESQL_ENRICH_MODE)) {
+ out.writeEnum(enrich.mode());
+ } else {
+ throw new IllegalArgumentException("cluster is not ready to handle " + enrich.mode().name());
+ }
out.writeExpression(enrich.policyName());
out.writeNamedExpression(enrich.matchField());
out.writeString(enrich.policy().policyName());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming, I've added a similar fix.
@@ -714,9 +715,14 @@ static void writeEval(PlanStreamOutput out, Eval eval) throws IOException { | |||
} | |||
|
|||
static Enrich readEnrich(PlanStreamInput in) throws IOException { | |||
Enrich.Mode m = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is mode nullable? If so, then we need to handle null when writing/reading it from the wire. I think it should not be null, default to ANY for old clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mode gets initialized to ANY if passed as null, inside the Enrich constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
FROM_OPENING_BRACKET : OPENING_BRACKET -> type(OPENING_BRACKET), pushMode(FROM_MODE), pushMode(FROM_MODE); | ||
FROM_CLOSING_BRACKET : CLOSING_BRACKET -> type(CLOSING_BRACKET), popMode, popMode; | ||
FROM_OPENING_BRACKET : OPENING_BRACKET -> type(OPENING_BRACKET); | ||
FROM_CLOSING_BRACKET : CLOSING_BRACKET -> type(CLOSING_BRACKET); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
Hi @costin, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment about the naming preference and approach. Irrespective of that, LGTM
Since most of the discussions are around the syntax, I've used @bpintea suggestions for now and used |
This fails since : is used inside the index for cross-cluster declaration. The grammar needs to be adjusted either to make it aware of the cluster declaration OR use a different tokenizer inside ENRICH that ignores |
Extend grammar to allow setting a mode for ENRICH to indicate the policy resolution when running against remote clusters/CCQ.
## Summary Sync up with elastic/elasticsearch#103949 grammar changes. Command definitions now have a new `mode` list of argument, with related `ESQLCommandMode` at AST level. Validation now accepts settings and validate the specific ENRICH one (error message is taken from the ES codebase). Autocomplete doesn't trigger by default on settings, but it only trigger when user starts to type a setting with the `[` trigger char: ![enrich_modes](https://github.com/elastic/kibana/assets/924948/8882361d-2fc7-44ab-bc8e-3994fc3e733d) Note that multiple settings are supported, but shadowing will trigger a warning. `ccq.mode` value name and descriptions have been taken from the linked ES PR. ### Checklist - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co>
## Summary Sync up with elastic/elasticsearch#103949 grammar changes. Command definitions now have a new `mode` list of argument, with related `ESQLCommandMode` at AST level. Validation now accepts settings and validate the specific ENRICH one (error message is taken from the ES codebase). Autocomplete doesn't trigger by default on settings, but it only trigger when user starts to type a setting with the `[` trigger char: ![enrich_modes](https://github.com/elastic/kibana/assets/924948/8882361d-2fc7-44ab-bc8e-3994fc3e733d) Note that multiple settings are supported, but shadowing will trigger a warning. `ccq.mode` value name and descriptions have been taken from the linked ES PR. ### Checklist - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co>
## Summary Sync up with elastic/elasticsearch#103949 grammar changes. Command definitions now have a new `mode` list of argument, with related `ESQLCommandMode` at AST level. Validation now accepts settings and validate the specific ENRICH one (error message is taken from the ES codebase). Autocomplete doesn't trigger by default on settings, but it only trigger when user starts to type a setting with the `[` trigger char: ![enrich_modes](https://github.com/elastic/kibana/assets/924948/8882361d-2fc7-44ab-bc8e-3994fc3e733d) Note that multiple settings are supported, but shadowing will trigger a warning. `ccq.mode` value name and descriptions have been taken from the linked ES PR. ### Checklist - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co>
## Summary Sync up with elastic/elasticsearch#103949 grammar changes. Command definitions now have a new `mode` list of argument, with related `ESQLCommandMode` at AST level. Validation now accepts settings and validate the specific ENRICH one (error message is taken from the ES codebase). Autocomplete doesn't trigger by default on settings, but it only trigger when user starts to type a setting with the `[` trigger char: ![enrich_modes](https://github.com/elastic/kibana/assets/924948/8882361d-2fc7-44ab-bc8e-3994fc3e733d) Note that multiple settings are supported, but shadowing will trigger a warning. `ccq.mode` value name and descriptions have been taken from the linked ES PR. ### Checklist - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co>
Extend grammar to allow setting a mode for ENRICH to indicate the policy
resolution when running against remote clusters/CCQ.