ESQL: Opt in to support for new `aggregate_metric_double` and `dense_vector` using query constructs #135215

nik9000 · 2025-09-22T18:20:49Z

Works around a problem with ESQL planning around new types. Without this if you have either aggregate_metric_double or dense_vector fields running on a node running < 9.2.0 ESQL will attempt detect those for queries like FROM * and request those fields. The old nodes don't know about the data type. So they'll claim they can't parse the request and fail. That's bad because in pure 9.1.x cluster FROM * would have returned null for those fields rather than failing. And, come on, FROM * shouldn't fail during an upgrade.

And it's not just during an upgrade. A mixed remote cluster being sent FROM remote:* over cross cluster search will have the same problem if it is 9.1 and the coordinating node is 9.2.

The fix we'd like to do is get the version of all nodes that'll have the request and only enable the type if all versions support it. That's complex, but it easy to think about and explain and handle in the planner.

We thought about a different fix - downgrading these fields to unsupported on write to an old version - but that's difficult to reason about and quite likely to fail in a long tail of weird ways.

This is something a simpler version of the "get all the versions and disable unsupported fields" that uses a cute hack to not have to fetch the version, which is the complex part. Instead, it scrapes the query for the TS command or for functions like KNN or TO_AGGREGATE_METRIC_DOUBLE and enables these fields if those are there. This works because folks who want dense_vector will always be using KNN in 9.2.0. And folks who want aggregate_metric_double will use TS. These things will fail if the entire cluster and all remote nodes aren't on 9.2.0 because they don't exist on those versions. So users will tell us, by writing things that require 9.2.0, if all the nodes are on 9.2.0.

We'll implement the transport version based approach in a follow up. But, for now, we'll use this trick.

Fix #135193

elasticsearchmachine · 2025-09-22T18:21:14Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

x-pack/plugin/esql/qa/server/mixed-cluster/build.gradle

…ql_fetch_field_types

…sql_fetch_field_types

nik9000 · 2025-09-23T01:51:42Z

Flipping back to draft because this now contains a potential fix.

It also needs expanding to the multi-cluster tests.

…ql_fetch_field_types

...e/src/javaRestTest/java/org/elasticsearch/xpack/esql/qa/multi_node/AllSupportedFieldsIT.java

idegtiarenko · 2025-09-23T07:29:24Z

.../src/javaRestTest/java/org/elasticsearch/xpack/esql/qa/single_node/AllSupportedFieldsIT.java

+import org.junit.ClassRule;
+
+@ThreadLeakFilters(filters = TestClustersThreadFilter.class)
+public class AllSupportedFieldsIT extends AllSupportedFieldsTestCase {


Same question for single-node. This is not going to serialize at all

It's useful here because it serves as a smoke test for the ground "it works" state. Kind of a self test for the test.

idegtiarenko · 2025-09-23T07:34:20Z

...qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/AllSupportedFieldsTestCase.java

+        }
+
+        nodeToInfo = new TreeMap<>();
+        Request getIds = new Request("GET", "_cat/nodes");


Table output is a bit tricky to parse.
We should use json alternative instead. There is an example getting a node name. Same object should have id and version as well:

elasticsearch/qa/rolling-upgrade/src/javaRestTest/java/org/elasticsearch/upgrades/DesiredNodesUpgradeIT.java

Lines 121 to 133 in 885a551

private List<String> getNodeNames() throws Exception {

final var request = new Request("GET", "/_nodes");

final var response = client().performRequest(request);

Map<String, Object> responseMap = responseAsMap(response);

Map<String, Map<String, Object>> nodes = extractValue(responseMap, "nodes");

final List<String> nodeNames = new ArrayList<>();

for (Map.Entry<String, Map<String, Object>> nodeInfoEntry : nodes.entrySet()) {

final String nodeName = extractValue(nodeInfoEntry.getValue(), "name");

nodeNames.add(nodeName);

}

return nodeNames;

}

I'm not sure that's better, but I can do it.

alex-spies

Hm, I think the fix could work for some queries, but we'll have to work through all kinds of different scenarios and add tests for them.

This only addresses the actual data type during serialization. This is not the same as planning out the query with the fact in mind that any field attributes may actually have an unsupported data type on a remote node.

Things that come to mind:

What if a previously unsupported type is used in union types?
What if a previously unsupported type is used in a lookup join? In particular, now we can have the fun situation where the lookup happens on the data node, but coordinator + lookup nodes already know the new data type while the data node doesn't.
Same goes for ENRICH.
Are all our commands actually okay with the pages they receive? This is the first time that in some of the pages, we have null blocks, and in others, we have blocks of the expected data type.
When old data nodes get sent their local plan, we may not touch a field of previously unsupported type there - but we may touch it further downstream, on the coordinator, where it's supposed to be supported. The validation for "no touchy unsupported field" is also fully done on the coordinator, which doesn't know about the fact that some of the data nodes don't know about one of the data types used.

x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/DataType.java

nik9000 · 2025-09-23T12:40:07Z

Are all our commands actually okay with the pages they receive? This is the first time that in some of the pages, we have null blocks, and in others, we have blocks of the expected data type.

It's possible some things don't handle all-null blocks but they should handle them. All null is valid for all block types and common to make for unmapped fields. Like if you query two indices and only one maps the field - the other will make all-null blocks. It's possible a trick like this would bring the null block to more places, but I don't think so.

nik9000 · 2025-09-23T12:43:16Z

When old data nodes get sent their local plan, we may not touch a field of previously unsupported type there - but we may touch it further downstream, on the coordinator, where it's supposed to be supported. The validation for "no touchy unsupported field" is also fully done on the coordinator, which doesn't know about the fact that some of the data nodes don't know about one of the data types used.

Does that mean that, with this hack, you can do stuff like:

FROM test
| WHERE unsupported IS NULL

and the data nodes running on an old version will run the IS NULL check?

alex-spies · 2025-09-23T13:09:25Z

and the data nodes running on an old version will run the IS NULL check?

Would have to confirm to say for sure, but they may just attempt to run the check. The verifier only runs after the analyzer, and both run only on the coordinator. So the verifier will not pick up the unsupported attributes and see that we're trying to use them in places that we aren't allowed to.

If you look at ComputeService#runCompute, you see that we take the received physical plan as-is and throw it immediately into the local optimizers. In any case, we'd be trying to run a query that would previously have been considered invalid.

And change bwc

alex-spies

I think we need to slap the test-release label on and see what we get. I think the test expectations are currently wrongly built on the SNAPSHOT-behavior - see below.

I think this is a great improvement and can be merged because it saves us from the problem that old nodes will just error out from existing queries due to serialization problems.

That said, I think it's incorrect to allow 9.1 nodes to participate in knn-using queries. Yes, they can deserialize the dense_vector data type. But the type is still disabled in 9.1 release builds. This will implicitly add support for 9.2/9.1 mixed/remote cluster situations. This may be fine! Or it may be not. In general, IMO a query should error out if we try to have a remote node use an unsupported type. And 9.1 doesn't support dense_vector, yet. The same holds, analogously, for aggregate_metric_double. And _tsid.

But we can tackle this in a separate PR, as long as it's before 9.2.0's release.

alex-spies · 2025-09-25T10:49:40Z

...qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/AllSupportedFieldsTestCase.java

+            case GEO_SHAPE -> equalTo("POINT (-71.34 41.12)");
+            case NULL -> nullValue();
+            case AGGREGATE_METRIC_DOUBLE -> {
+                if (minVersion.onOrAfter(Version.V_9_2_0)) {


I think per default 9.1.5 nodes shouldn't send aggregate_metric_double data; the data type was still under construction then. Same for dense_vector.

If we know that it's safe to get that data sent over, this is fine. But I don't know that and this means we're letting 9.1 nodes silently participate in queries that they may never have been tested for.

This goes back to my other #135215 (comment); in general, I think it's unsafe to treat a type as supported as soon as it can be deserialized; it's only safe to treat it as supported once it leaves the "under construction" stage.

alex-spies · 2025-09-25T11:05:28Z

...qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/AllSupportedFieldsTestCase.java

+            // Currently unsupported without TS command or KNN function
+            case AGGREGATE_METRIC_DOUBLE, DENSE_VECTOR -> {
+                if (false == minVersion.onOrAfter(Version.V_9_2_0)) {
+                    yield either(equalTo("unsupported")).or(equalTo(type.esType()));


Oh! Hold on. With this change, the output should be stable overall when we have e.g. 9.1/9.2 mixed clusters in release builds. It shouldn't depend which node we send a query to - if the query isn't a knn or TS query (and so on), the expected output is dictated by the output of the old nodes.

Both should tell you "we don't support this type" and fill the column with nulls.

Are we looking at the right build? The new types are already enabled in SNAPSHOT builds since 9.0/9.1 (resp. 8.18/8.19). But disabled on release builds.

I went and ran from k8s-downsampled | keep network.eth0.rx | limit 1 against a local SNAPSHOT build, and a local release build. For the former, you'll get values and the right type in the column. For the latter, you'll get

{ "took" : 184, "is_partial" : false, "documents_found" : 1, "values_loaded" : 0, "columns" : [ { "name" : "network.eth0.rx", "type" : "unsupported", "original_types" : [ "aggregate_metric_double" ], "suggested_cast" : "aggregate_metric_double" } ], "values" : [ [ null ] ] }

This test's behavior is important mostly for release builds. Maybe this should be a release-build only test? Including SNAPSHOT builds will muddle the water and make this hard to reason about.

alex-spies · 2025-09-25T11:07:40Z

...qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/AllSupportedFieldsTestCase.java

+                if (false == minVersion.onOrAfter(Version.V_9_2_0)) {
+                    yield either(equalTo("unsupported")).or(equalTo(type.esType()));
+                }
+                yield equalTo("unsupported");


Should we also test that the originalType is correctly reported in the output? That's something we could mess up when tinkering with the under construction types.

x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/DataType.java

...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java

…ql_fetch_field_types

alex-spies · 2025-09-26T10:22:03Z

x-pack/plugin/build.gradle

  task.skipTest("ml/sparse_vector_search/Search on a sparse_vector field with dots in the field names", "Vectors are no longer returned by default")
  task.skipTest("ml/sparse_vector_search/Search on a nested sparse_vector field with dots in the field names and conflicting child fields", "Vectors are no longer returned by default")
  task.skipTest("esql/190_lookup_join/lookup-no-key-only-key", "Requires the fix")
+  task.skipTest("esql/40_tsdb/aggregate_metric_double unsortable", "Extra function required to enable the field type")


Hm, if we don't rename the now skipped tests, does that mean that the fwc test will be skipped for them forever? Would it be safer to update all the names of the changed yaml tests, e.g. by adding a v2 suffix?

Oh shoot. Uh. I don't know.

It won't be needed after I fix it to use the version. We can just un-skip it. I think.

But I'm not sure.

…ql_fetch_field_types

- Fix the test expectations for our AllSupportedFieldsITs according to #135215 (comment). - Clean up our mechanism and docs for "under construction" data types: - For SNAPSHOT builds, go back to the old status quo where all types are considered supported and we just skip tests based on capabilities. - Re-brand the CreatedVersion for data types to SupportedVersion. For release builds, it doesn't matter when serialization code for a type was introduced; it matters only when it was actually released, and this is generally later than when the serialization code for the type was written. - Remove the old feature-flag based system for marking types as "under construction" as we don't plan to use feature flags, but minimum transport versions to determine if a remote node/cluster supports a type.

ESQL: Test fetching old types

870db5f

nik9000 added >test Issues or PRs that are addressing/adding tests :Analytics/ES|QL AKA ESQL v9.2.0 labels Sep 22, 2025

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 22, 2025

[CI] Auto commit changes from spotless

c541fbd

nik9000 commented Sep 22, 2025

View reviewed changes

x-pack/plugin/esql/qa/server/mixed-cluster/build.gradle Outdated Show resolved Hide resolved

nik9000 and others added 7 commits September 22, 2025 14:37

Drop unused

ef1da90

Merge remote-tracking branch 'nik9000/esql_fetch_field_types' into es…

6d546f9

…ql_fetch_field_types

[CI] Update transport version definitions

c5c40cb

better logging

ce91be1

Merge remote-tracking branch 'nik9000/estql_fetch_field_types' into e…

dae8a4a

…sql_fetch_field_types

Merge branch 'main' into esql_fetch_field_types

d7f00df

[CI] Auto commit changes from spotless

654abde

nik9000 marked this pull request as draft September 23, 2025 01:51

nik9000 added 2 commits September 22, 2025 21:52

Fix?

1945205

Merge remote-tracking branch 'nik9000/esql_fetch_field_types' into es…

aae54e9

…ql_fetch_field_types

idegtiarenko reviewed Sep 23, 2025

View reviewed changes

...e/src/javaRestTest/java/org/elasticsearch/xpack/esql/qa/multi_node/AllSupportedFieldsIT.java Outdated Show resolved Hide resolved

idegtiarenko reviewed Sep 23, 2025

View reviewed changes

alex-spies reviewed Sep 23, 2025

View reviewed changes

x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/DataType.java Show resolved Hide resolved

nik9000 and others added 4 commits September 23, 2025 12:17

Multi

67b4e73

And change bwc

[CI] Auto commit changes from spotless

3780362

[CI] Update transport version definitions

5249481

Only enable if fn or ts

09c4eb9

alex-spies reviewed Sep 25, 2025

View reviewed changes

x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/DataType.java Show resolved Hide resolved

...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java Show resolved Hide resolved

Extra

8b6c45f

nik9000 added test-release Trigger CI checks against release build test-full-bwc Trigger full BWC version matrix tests labels Sep 25, 2025

nik9000 and others added 15 commits September 25, 2025 11:28

Udpate

c60a704

Merge branch 'main' into esql_fetch_field_types

f38c8a1

[CI] Auto commit changes from spotless

78a85db

Merge branch 'main' into esql_fetch_field_types

86005ad

mor efix

9919907

Merge branch 'main' into esql_fetch_field_types

fef79ba

Merge remote-tracking branch 'nik9000/esql_fetch_field_types' into es…

60e7e05

…ql_fetch_field_types

Format

435d233

Add other

940a3b2

Log

d4cfad8

Some udpates

94784ab

Merge branch 'main' into esql_fetch_field_types

0d9c027

[CI] Auto commit changes from spotless

d2a5064

Test case

d172d26

Merge remote-tracking branch 'nik9000/esql_fetch_field_types' into es…

20472de

…ql_fetch_field_types

alex-spies reviewed Sep 26, 2025

View reviewed changes

nik9000 and others added 5 commits September 26, 2025 08:36

updates one last time please please

280daf8

[CI] Auto commit changes from spotless

85771e7

Merge branch 'main' into esql_fetch_field_types

cd0ce72

Merge remote-tracking branch 'nik9000/esql_fetch_field_types' into es…

e71115c

…ql_fetch_field_types

spotless

d694176

nik9000 merged commit 94c1ba4 into elastic:main Sep 26, 2025
30 checks passed

This was referenced Sep 30, 2025

ESQL: Update mechanism for under construction data types #135697

Merged

ESQL: Make query planning/optimization aware of the version of nodes/clusters #131108

Open

	private List<String> getNodeNames() throws Exception {
	final var request = new Request("GET", "/_nodes");
	final var response = client().performRequest(request);
	Map<String, Object> responseMap = responseAsMap(response);
	Map<String, Map<String, Object>> nodes = extractValue(responseMap, "nodes");
	final List<String> nodeNames = new ArrayList<>();
	for (Map.Entry<String, Map<String, Object>> nodeInfoEntry : nodes.entrySet()) {
	final String nodeName = extractValue(nodeInfoEntry.getValue(), "name");
	nodeNames.add(nodeName);
	}

	return nodeNames;
	}

ESQL: Opt in to support for new aggregate_metric_double and dense_vector using query constructs #135215

ESQL: Opt in to support for new aggregate_metric_double and dense_vector using query constructs #135215

Uh oh!

Conversation

nik9000 commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Sep 22, 2025

Uh oh!

Uh oh!

nik9000 commented Sep 23, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nik9000 commented Sep 23, 2025

Uh oh!

nik9000 commented Sep 23, 2025

Uh oh!

alex-spies commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alex-spies left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ESQL: Opt in to support for new `aggregate_metric_double` and `dense_vector` using query constructs #135215

ESQL: Opt in to support for new `aggregate_metric_double` and `dense_vector` using query constructs #135215

nik9000 commented Sep 22, 2025 •

edited

Loading

alex-spies commented Sep 23, 2025 •

edited

Loading

alex-spies left a comment •

edited

Loading