Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Introduce mode setting for ENRICH #103949

Merged
merged 14 commits into from Jan 11, 2024

Conversation

costin
Copy link
Member

@costin costin commented Jan 5, 2024

Extend grammar to allow setting a mode for ENRICH to indicate the policy
resolution when running against remote clusters/CCQ.

Extend grammar to allow setting a mode for ENRICH to indicate the policy
 resolution when running against remote clusters/CCQ.
@costin costin added :Analytics/ES|QL AKA ESQL ES|QL-ui Impacts ES|QL UI v8.13.0 labels Jan 5, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 5, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

new Literal(EMPTY, "countries", KEYWORD),
new UnresolvedAttribute(EMPTY, "country_code"),
null,
List.of()
),
processingCommand("enrich countries ON country_code")
processingCommand("enrich [mode=" + mode.name() + "] countries ON country_code")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really happy with the naming. Mode is fairly abstract - I would have preferred resolution, resolved, policy-location but it doesn't fit well with any, relative or originator.
Naming is hard. /cc @nik9000 for more suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from?

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment around BWC, but this looks great. Thanks Costin!

@@ -246,9 +246,13 @@ showCommand
;

enrichCommand
: ENRICH policyName=fromIdentifier (ON matchField=qualifiedNamePattern)? (WITH enrichWithClause (COMMA enrichWithClause)*)?
: ENRICH (enrichMode=enrichPolicyMode)? policyName=fromIdentifier (ON matchField=qualifiedNamePattern)? (WITH enrichWithClause (COMMA enrichWithClause)*)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the mode be after the policy name? I am fine with either option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the first option I went for however I wanted to be consistent with FROM namely have the options for a command specified after its declaration not per parameter, e.g.:

ENRICH policy[mode=any]   // it's clear mode affects the policy however is fragile as it could be interpreted as being
							// part of the policy name. Right now the index doesn't include[] but 
							// if this gets changed, the declaration will break
ENRICH "policy[with brackets]"[mode=any]   // not as pretty
ENRICH [mode= any] "policyWithBracket[]" 	// longer but safer to evolve

Another idea is to namespace the parameter to indicate what this mode is about since not everyone would be using CCQ and use something like ccq.mode, ccq.policy, ccq.policy.mode, ccq.policy.resolution:

ENRICH [ccq.policy.mode=any] policyName ON field WITH ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. I am fine with either option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENRICH policy
ENRICH FROM ANY policy
ENRICH FROM COORDINATING policy
ENRICH FROM REMOTE policy
?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @nik9000's suggestion and I am kind of torn between the two approaches:

  • one is consistent with FROM index [metadata _source] approach where we put stuff between square brackets in the form of settings, or additional instructions. Whenever we have a similar setting in the future for commands we should probably follow the same style.
  • ENRICH is kind of special because the command itself is a series of natural language words put together to form a sentence. For example ENRICH languages_policy ON language WITH name = language_name would become:
ENRICH FROM COORDINATING languages_policy ON language WITH name = language_name

the drawback, though, is that if we add any additional elements to ENRICH in the future we need to do it the same natural language way, whereas with a square brackets approach the mechanism allows unlimited extensibility (just a made up setting name): [policy.mode=coordinating, node.affinity=hot]

Now, that I listed both options, I tend to favor the consistent square brackets option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'd also find it more fluent keeping the consistency with the ENRICH command itself (rather than FROM command) and add a FROM clause after the policy name.
OTOH, FROM command is an outlier to receive parameters in square brackets, so these one here would bring some balance (but I'd still like a query language with fewer [, ]).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FROM flows better however it's not a full property of ENRICH since it only makes sense when running a query in a CCQ environment against a CCQ index.

  1. FROM would have to apply only to the policy (which is not an index)
  2. without a remote index, command: ENRICH FROM COORDINATING language is not only meaningless but incorrect
  3. we're hoping the setting is an advance one - namely by default folks would use the any mode.

For these reasons I think we should give it less importance than a regular property.

Essentially this introduces a slightly separate mode:
FROM [medata _index, _source]
ENRICH [ccq.mode=any]

metadata does NOT use = and has a list of values
ccq.mode uses = and has only one value.

We should align the two so either:
FROM [metadata=_index,_source] OR ENRICH [ccq.mode any]
In both cases, we need to figure out how to handle enumeration of other options that might be added in the future.
A quick proposal is to add another square bracket
ENRICH [ccq.mode any] [another.param value], etc...
A bit more verbose but easier to navigate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without a remote index, command: ENRICH FROM COORDINATING language is not only meaningless but incorrect

I guess ENRICH [ccq.mode=remote] would still err if run locally. However, if we want this to be an "advanced" setting, it might make sense to use [, ] then.

OR ENRICH [ccq.mode any]

👍

ENRICH [ccq.mode any] [another.param value], etc...

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess ENRICH [ccq.mode=remote] would still err if run locally. However, if we want this to be an "advanced" setting, it might make sense to use [, ] then.

We haven't decided yet if we want to lenient or not:

  1. declaring any ccq property without a ccq index could be considered an error to force the user to remove the setting
  2. instead of throwing an error, we could just throw a warning and ignore the setting. this way the same setting can be
    used across difference environments; the downside though is the user might end up ignoring the warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three options should work whether remote indices are present or not.

  • ANY means that enrich takes place on any cluster.
  • COORDINATING means that enrich takes place on the coordinating cluster receiving an ESQL.
  • REMOTE means that enrich takes place on the cluster hosting the target index.

For local indices, all options have the same semantics. These options affect only the remote indices.

@@ -727,6 +728,7 @@ static Enrich readEnrich(PlanStreamInput in) throws IOException {
static void writeEnrich(PlanStreamOutput out, Enrich enrich) throws IOException {
out.writeNoSource();
out.writeLogicalPlanNode(enrich.child());
out.writeEnum(enrich.mode());
Copy link
Member

@dnhatn dnhatn Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to handle BWC here.

diff --git a/server/src/main/java/org/elasticsearch/TransportVersions.java b/server/src/main/java/org/elasticsearch/TransportVersions.java
index ce1f6d9ecaad..7d7adc4156de 100644
--- a/server/src/main/java/org/elasticsearch/TransportVersions.java
+++ b/server/src/main/java/org/elasticsearch/TransportVersions.java
@@ -178,6 +178,7 @@ public class TransportVersions {
     public static final TransportVersion SNAPSHOTS_IN_PROGRESS_TRACKING_REMOVING_NODES_ADDED = def(8_566_00_0);
     public static final TransportVersion SMALLER_RELOAD_SECURE_SETTINGS_REQUEST = def(8_567_00_0);
     public static final TransportVersion UPDATE_API_KEY_EXPIRATION_TIME_ADDED = def(8_568_00_0);
+    public static final TransportVersion ESQL_ENRICH_MODE = def(8_569_00_0);
 
     /*
      * STOP! READ THIS FIRST! No, really,
diff --git a/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/io/stream/PlanNamedTypes.java b/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/io/stream/PlanNamedTypes.java
index 7e81207ac919..e8c427870190 100644
--- a/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/io/stream/PlanNamedTypes.java
+++ b/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/io/stream/PlanNamedTypes.java
@@ -7,6 +7,7 @@
 
 package org.elasticsearch.xpack.esql.io.stream;
 
+import org.elasticsearch.TransportVersions;
 import org.elasticsearch.common.TriFunction;
 import org.elasticsearch.common.io.stream.NamedWriteable;
 import org.elasticsearch.common.io.stream.StreamInput;
@@ -717,7 +718,7 @@ public final class PlanNamedTypes {
         return new Enrich(
             in.readSource(),
             in.readLogicalPlanNode(),
-            in.readEnum(Enrich.Mode.class),
+            in.getTransportVersion().onOrAfter(TransportVersions.ESQL_ENRICH_MODE) ? in.readEnum(Enrich.Mode.class) : Enrich.Mode.ANY,
             in.readExpression(),
             in.readNamedExpression(),
             new EnrichPolicyResolution(in.readString(), new EnrichPolicy(in), IndexResolution.valid(readEsIndex(in))),
@@ -728,7 +729,11 @@ public final class PlanNamedTypes {
     static void writeEnrich(PlanStreamOutput out, Enrich enrich) throws IOException {
         out.writeNoSource();
         out.writeLogicalPlanNode(enrich.child());
-        out.writeEnum(enrich.mode());
+        if (out.getTransportVersion().onOrAfter(TransportVersions.ESQL_ENRICH_MODE)) {
+            out.writeEnum(enrich.mode());
+        } else {
+            throw new IllegalArgumentException("cluster is not ready to handle " + enrich.mode().name());
+        }
         out.writeExpression(enrich.policyName());
         out.writeNamedExpression(enrich.matchField());
         out.writeString(enrich.policy().policyName());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming, I've added a similar fix.

@@ -714,9 +715,14 @@ static void writeEval(PlanStreamOutput out, Eval eval) throws IOException {
}

static Enrich readEnrich(PlanStreamInput in) throws IOException {
Enrich.Mode m = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is mode nullable? If so, then we need to handle null when writing/reading it from the wire. I think it should not be null, default to ANY for old clusters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mode gets initialized to ANY if passed as null, inside the Enrich constructor.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

FROM_OPENING_BRACKET : OPENING_BRACKET -> type(OPENING_BRACKET), pushMode(FROM_MODE), pushMode(FROM_MODE);
FROM_CLOSING_BRACKET : CLOSING_BRACKET -> type(CLOSING_BRACKET), popMode, popMode;
FROM_OPENING_BRACKET : OPENING_BRACKET -> type(OPENING_BRACKET);
FROM_CLOSING_BRACKET : CLOSING_BRACKET -> type(CLOSING_BRACKET);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@costin costin added the >feature label Jan 5, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @costin, I've created a changelog YAML for you.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment about the naming preference and approach. Irrespective of that, LGTM

@costin costin added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jan 10, 2024
@costin
Copy link
Member Author

costin commented Jan 10, 2024

Since most of the discussions are around the syntax, I've used @bpintea suggestions for now and used : to separate key values and kept using []. Will revisit this based on the internal discussions.
P.S. I've updated the grammar to exclude : from an index name - this character was deprecated in 7.x and no longer supported.
There are more characters we could exclude (see this section in the docs).

@costin
Copy link
Member Author

costin commented Jan 10, 2024

I've updated the grammar to exclude : from an index name

This fails since : is used inside the index for cross-cluster declaration. The grammar needs to be adjusted either to make it aware of the cluster declaration OR use a different tokenizer inside ENRICH that ignores :. I'll look into that tomorrow.

@elasticsearchmachine elasticsearchmachine merged commit 8e3efae into elastic:main Jan 11, 2024
15 checks passed
@costin costin deleted the esql/ccq-enrich branch January 11, 2024 04:57
jedrazb pushed a commit to jedrazb/elasticsearch that referenced this pull request Jan 17, 2024
Extend grammar to allow setting a mode for ENRICH to indicate the policy
resolution when running against remote clusters/CCQ.
dej611 added a commit to elastic/kibana that referenced this pull request Jan 23, 2024
## Summary

Sync up with elastic/elasticsearch#103949
grammar changes.

Command definitions now have a new `mode` list of argument, with related
`ESQLCommandMode` at AST level.

Validation now accepts settings and validate the specific ENRICH one
(error message is taken from the ES codebase).
Autocomplete doesn't trigger by default on settings, but it only trigger
when user starts to type a setting with the `[` trigger char:


![enrich_modes](https://github.com/elastic/kibana/assets/924948/8882361d-2fc7-44ab-bc8e-3994fc3e733d)

Note that multiple settings are supported, but shadowing will trigger a
warning.

`ccq.mode` value name and descriptions have been taken from the linked
ES PR.

### Checklist

- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co>
lcawl pushed a commit to lcawl/kibana that referenced this pull request Jan 26, 2024
## Summary

Sync up with elastic/elasticsearch#103949
grammar changes.

Command definitions now have a new `mode` list of argument, with related
`ESQLCommandMode` at AST level.

Validation now accepts settings and validate the specific ENRICH one
(error message is taken from the ES codebase).
Autocomplete doesn't trigger by default on settings, but it only trigger
when user starts to type a setting with the `[` trigger char:


![enrich_modes](https://github.com/elastic/kibana/assets/924948/8882361d-2fc7-44ab-bc8e-3994fc3e733d)

Note that multiple settings are supported, but shadowing will trigger a
warning.

`ccq.mode` value name and descriptions have been taken from the linked
ES PR.

### Checklist

- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co>
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request Feb 15, 2024
## Summary

Sync up with elastic/elasticsearch#103949
grammar changes.

Command definitions now have a new `mode` list of argument, with related
`ESQLCommandMode` at AST level.

Validation now accepts settings and validate the specific ENRICH one
(error message is taken from the ES codebase).
Autocomplete doesn't trigger by default on settings, but it only trigger
when user starts to type a setting with the `[` trigger char:


![enrich_modes](https://github.com/elastic/kibana/assets/924948/8882361d-2fc7-44ab-bc8e-3994fc3e733d)

Note that multiple settings are supported, but shadowing will trigger a
warning.

`ccq.mode` value name and descriptions have been taken from the linked
ES PR.

### Checklist

- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co>
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request Feb 15, 2024
## Summary

Sync up with elastic/elasticsearch#103949
grammar changes.

Command definitions now have a new `mode` list of argument, with related
`ESQLCommandMode` at AST level.

Validation now accepts settings and validate the specific ENRICH one
(error message is taken from the ES codebase).
Autocomplete doesn't trigger by default on settings, but it only trigger
when user starts to type a setting with the `[` trigger char:


![enrich_modes](https://github.com/elastic/kibana/assets/924948/8882361d-2fc7-44ab-bc8e-3994fc3e733d)

Note that multiple settings are supported, but shadowing will trigger a
warning.

`ccq.mode` value name and descriptions have been taken from the linked
ES PR.

### Checklist

- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) ES|QL-ui Impacts ES|QL UI >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants