From 62e972f575177bc2b5dfe2389f08f2bb30c74750 Mon Sep 17 00:00:00 2001 From: Simon Cooper Date: Thu, 16 Jan 2025 16:35:43 +0000 Subject: [PATCH 1/4] Start on feature javadocs --- .../elasticsearch/features/package-info.java | 94 +++++++++++++++++++ 1 file changed, 94 insertions(+) create mode 100644 server/src/main/java/org/elasticsearch/features/package-info.java diff --git a/server/src/main/java/org/elasticsearch/features/package-info.java b/server/src/main/java/org/elasticsearch/features/package-info.java new file mode 100644 index 0000000000000..cf097c336f281 --- /dev/null +++ b/server/src/main/java/org/elasticsearch/features/package-info.java @@ -0,0 +1,94 @@ +/* + * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one + * or more contributor license agreements. Licensed under the "Elastic License + * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side + * Public License v 1"; you may not use this file except in compliance with, at + * your election, the "Elastic License 2.0", the "GNU Affero General Public + * License v3.0 only", or the "Server Side Public License, v 1". + */ + +/** + * The features infrastructure in Elasticsearch is responsible for two things: + *
    + *
  1. + * Determining when all nodes in a cluster have been upgraded to support some new functionality. + * This is used to only engage new behavior when all nodes in the cluster support it. + *
  2. + *
  3. + * Ensuring nodes only join a cluster if they support all features already present on that cluster. + * This is to ensure that once a cluster supports a feature, it then never loses support. + * Conversely, when a feature is defined, it can then never be removed (but see Assumed features below). + *
  4. + *
+ * + *

Functionality

+ * This functionality starts with {@link org.elasticsearch.features.NodeFeature}. This is a single id representing + * new or a change in functionality - exactly what that functionality is is up to the user. These are expected + * to be {@code public static final} variables on a relevant class. Each area of code then exposes their features + * through an implementation of {@link org.elasticsearch.features.FeatureSpecification#getFeatures}, registered as an SPI implementation. + *

+ * All the features exposed by a node are included in the {@link org.elasticsearch.cluster.coordination.JoinTask.NodeJoinTask} + * processed by {@link org.elasticsearch.cluster.coordination.NodeJoinExecutor} when a node joins a cluster. This checks + * the joining node has all the features already present on the cluster, and then records the set of features against that node + * in cluster state (in the {@link org.elasticsearch.cluster.ClusterFeatures} object). + *

+ * Informally, the features supported by a particular node are 'node features'; when all nodes in a cluster support a particular + * feature, that is then a 'cluster feature'. + *

+ * Node features can then be checked by code to determine if all nodes in the cluster support that particular feature. + * This is done using {@link org.elasticsearch.features.FeatureService#clusterHasFeature}. This is a fast operation - the first + * time this method is called on a particular cluster state, the cluster features for a cluster are calculated from all the + * node feature information, and cached in the {@link org.elasticsearch.cluster.ClusterFeatures} object. + * Henceforth, all cluster feature checks are fast hash set lookups. + * + *

Features test infrastructure

+ * Features can be specified as conditions in YAML tests, as well as checks and conditions in code-defined rolling upgrade tests + * (see the Elasticsearch development documentation for more information). + * These checks are performed by the {@code TestFeatureService} interface, and its standard implementation {@code ESRestTestFeatureService}. + * + *

Test features

+ * Sometimes, you want to define a feature for nodes, but the only checks you need to do are as part of a test. In this case, + * the feature doesn't need to be included in the production codebase, it only needs to be present for automated tests. + * So alongside {@link org.elasticsearch.features.FeatureSpecification#getFeatures}, there is + * {@link org.elasticsearch.features.FeatureSpecification#getTestFeatures}. This can be used to exposed node features, + * but only for automated tests. It is ignored in production uses. This is determined by the {@link org.elasticsearch.features.FeatureData} + * class, which uses a system property (set by the test infrastructure) to decide whether to include test features or not, + * when gathering all the registered {@code FeatureSpecification} instances. + * + *

Synthetic version features

+ * Cluster functionality checks performed on code built from the {@code main} branch can only use features, but we also have packaged releases + * with a longer release cadence. Sometimes tests need to be conditional on older versions (where there isn't a feature already defined + * for you), determined after the fact. This is where synthetic version features comes in. These can be used in tests where + * it is sensible to use a release version number (eg 8.12.3). The presence of these features is determined solely by the minimum + * node version present in the test cluster; no actual cluster features are defined nor checked. + * This is done by {@code ESRestTestFeatureService}, matching on features of the form {@code gte_v8.12.3}. + * For more information on their use, see the Elasticsearch developer documentation. + * + *

Assumed features

+ * Once a feature is defined on a cluster, it can never be removed - this is to ensure that functionality that is available + * on a cluster then never stops being available. However, this can lead to the list of features in cluster state growing ever larger. + * It is possible to remove defined cluster features, but only on a compatibility boundary (normally a new major release). + * To see how this can be so, it may be helpful to start with the compatibility guarantees we provide: + * + * So, starting up a fresh v10 cluster, it does not need to have any knowledge of features added before 9.16, as the cluster + * will always be using the new functionality. + *

+ * + */ +package org.elasticsearch.features; From 7b6861afaa650a6d01aee42cae245df718b5434d Mon Sep 17 00:00:00 2001 From: Simon Cooper Date: Fri, 17 Jan 2025 09:56:40 +0000 Subject: [PATCH 2/4] Add package javadocs for features functionality --- .../elasticsearch/features/package-info.java | 95 +++++++++++++++---- 1 file changed, 76 insertions(+), 19 deletions(-) diff --git a/server/src/main/java/org/elasticsearch/features/package-info.java b/server/src/main/java/org/elasticsearch/features/package-info.java index cf097c336f281..d6b0a0be40bdf 100644 --- a/server/src/main/java/org/elasticsearch/features/package-info.java +++ b/server/src/main/java/org/elasticsearch/features/package-info.java @@ -12,25 +12,26 @@ *

    *
  1. * Determining when all nodes in a cluster have been upgraded to support some new functionality. - * This is used to only engage new behavior when all nodes in the cluster support it. + * This is used to only utilise new behavior when all nodes in the cluster support it. *
  2. *
  3. * Ensuring nodes only join a cluster if they support all features already present on that cluster. - * This is to ensure that once a cluster supports a feature, it then never loses support. + * This is to ensure that once a cluster supports a feature, it then never drops support. * Conversely, when a feature is defined, it can then never be removed (but see Assumed features below). *
  4. *
* *

Functionality

* This functionality starts with {@link org.elasticsearch.features.NodeFeature}. This is a single id representing - * new or a change in functionality - exactly what that functionality is is up to the user. These are expected + * new or a change in functionality - exactly what functionality that feature represents is up to the developer. These are expected * to be {@code public static final} variables on a relevant class. Each area of code then exposes their features * through an implementation of {@link org.elasticsearch.features.FeatureSpecification#getFeatures}, registered as an SPI implementation. *

- * All the features exposed by a node are included in the {@link org.elasticsearch.cluster.coordination.JoinTask.NodeJoinTask} - * processed by {@link org.elasticsearch.cluster.coordination.NodeJoinExecutor} when a node joins a cluster. This checks + * All the features exposed by a node are included in the {@link org.elasticsearch.cluster.coordination.JoinTask.NodeJoinTask} information + * processed by {@link org.elasticsearch.cluster.coordination.NodeJoinExecutor}, when a node attempts to join a cluster. This checks * the joining node has all the features already present on the cluster, and then records the set of features against that node * in cluster state (in the {@link org.elasticsearch.cluster.ClusterFeatures} object). + * The calculated effective cluster features are not persisted, only the per-node feature set. *

* Informally, the features supported by a particular node are 'node features'; when all nodes in a cluster support a particular * feature, that is then a 'cluster feature'. @@ -39,7 +40,7 @@ * This is done using {@link org.elasticsearch.features.FeatureService#clusterHasFeature}. This is a fast operation - the first * time this method is called on a particular cluster state, the cluster features for a cluster are calculated from all the * node feature information, and cached in the {@link org.elasticsearch.cluster.ClusterFeatures} object. - * Henceforth, all cluster feature checks are fast hash set lookups. + * Henceforth, all cluster feature checks are fast hash set lookups, at least until the nodes or master changes. * *

Features test infrastructure

* Features can be specified as conditions in YAML tests, as well as checks and conditions in code-defined rolling upgrade tests @@ -48,18 +49,22 @@ * *

Test features

* Sometimes, you want to define a feature for nodes, but the only checks you need to do are as part of a test. In this case, - * the feature doesn't need to be included in the production codebase, it only needs to be present for automated tests. + * the feature doesn't need to be included in the production feature set, it only needs to be present for automated tests. * So alongside {@link org.elasticsearch.features.FeatureSpecification#getFeatures}, there is * {@link org.elasticsearch.features.FeatureSpecification#getTestFeatures}. This can be used to exposed node features, * but only for automated tests. It is ignored in production uses. This is determined by the {@link org.elasticsearch.features.FeatureData} * class, which uses a system property (set by the test infrastructure) to decide whether to include test features or not, * when gathering all the registered {@code FeatureSpecification} instances. + *

+ * Test features can be removed at-will (with appropriate backports), + * as there is no long-term upgrade guarantees required for clusters in automated tests. * *

Synthetic version features

- * Cluster functionality checks performed on code built from the {@code main} branch can only use features, but we also have packaged releases - * with a longer release cadence. Sometimes tests need to be conditional on older versions (where there isn't a feature already defined - * for you), determined after the fact. This is where synthetic version features comes in. These can be used in tests where - * it is sensible to use a release version number (eg 8.12.3). The presence of these features is determined solely by the minimum + * Cluster functionality checks performed on code built from the {@code main} branch can only use features to check functionality, + * but we also have branch releases with a longer release cadence. Sometimes tests need to be conditional on older versions + * (where there isn't a feature already defined in the right place), determined some point after the release has been finalized. + * This is where synthetic version features comes in. These can be used in tests where it is sensible to use + * a release version number (eg 8.12.3). The presence of these features is determined solely by the minimum * node version present in the test cluster; no actual cluster features are defined nor checked. * This is done by {@code ESRestTestFeatureService}, matching on features of the form {@code gte_v8.12.3}. * For more information on their use, see the Elasticsearch developer documentation. @@ -71,24 +76,76 @@ * To see how this can be so, it may be helpful to start with the compatibility guarantees we provide: * - * So, starting up a fresh v10 cluster, it does not need to have any knowledge of features added before 9.16, as the cluster - * will always be using the new functionality. + * So, starting up a fresh v9 cluster, it does not need to have any knowledge of features added before 8.18, as the cluster + * will always have the new functionality. + *

+ * So then how do we do a rolling upgrade from 8.18 to 9.0, if features have been removed? Normally, that would prevent a 9.0 + * node from joining an 8.18 cluster, as it will not have all the unnecessary features. However, we can make use + * of the major version difference to allow the rolling upgrade to proceed. + *

+ * This is where the {@link org.elasticsearch.features.NodeFeature#assumedAfterNextCompatibilityBoundary()} field comes in. On 8.18, + * we can mark all the features that will be removed in 9.0 as assumed. This means that when the features infrastructure sees a + * 9.x node, it will deem that node to have all the assumed features, even if the 9.0 node doesn't actually have those features + * in its published set. It will allow 9.0 nodes to join the cluster missing assumed features, + * and it will say the cluster supports a particular assumed feature even if it is missing from any 9.0 nodes in the cluster. + *

+ * Essentially, 8.18 nodes (or any other version that can form a cluster with 8.x or 9.x nodes) can mediate + * between the 8.x and 9.x feature sets, using {@code assumedAfterNextCompatibilityBoundary} + * to mark features that have been removed from 9.x, and know that 9.x nodes still meet the requirements for those features. + * These assumed features need to be defined before 8.18 and 9.0 are released. *

+ * To go into more detail what happens during a rolling upgrade: + *

    + *
  1. Start with a homogenous 8.18 cluster, with an 8.18 cluster feature set (including assumed features)
  2. + *
  3. + * The first 9.0 node joins the cluster. Even though it is missing the features marked as assumed in 8.18, + * the 8.18 master lets the 9.0 node join because all the missing features are marked as assumed, + * and it is of the next major version. + *
  4. + *
  5. + * At this point, any feature checks that happen on 8.18 nodes for assumed features pass, despite the 9.0 node + * not publishing those features, as the 9.0 node is assumed to meet the requirements for that feature. + * 9.0 nodes do not have those checks at all, and the corresponding code running on 9.0 uses the new behaviour without checking. + *
  6. + *
  7. More 8.18 nodes get swapped for 9.0 nodes
  8. + *
  9. + * At some point, the master will change from an 8.18 node to a 9.0 node. The 9.0 node does not have the assumed + * features at all, so the new cluster feature set as calculated by the 9.0 master will only contain the features + * that 9.0 knows about (the calculated feature set is not persisted anywhere). + * The cluster has effectively dropped all the 8.18 features assumed in 9.0, whilst maintaining all behaviour. + * The upgrade carries on. + *
  10. + *
  11. + * If an 8.18 node were to quit and re-join the cluster still as 8.18 at this point + * (and there are other 8.18 nodes not yet upgraded), it will be able to join the cluster despite the master being 9.0. + * The 8.18 node publishes all the assumed features that 9.0 does not have - but that doesn't matter, because nodes can join + * with more features than are present in the cluster as a whole. The additional features are not added + * to the cluster feature set because not all the nodes in the cluster have those features + * (as there is at least one 9.0 node in the cluster - itself). +*
  12. + *
  13. + * At some point, the last 8.18 node leaves the cluster, and the cluster is a homogenous 9.0 cluster + * with only the cluster features known about by 9.0. + *
  14. + *
* + * For any dynamic releases that occur from main, the cadence is much quicker - once a feature is present in a cluster, + * you then only need one completed release to mark a feature as assumed, and a subsequent release to remove it from the codebase + * and elide the corresponding check. */ package org.elasticsearch.features; From 2d761afa951bb2becc88f79db1dbe52724c7603c Mon Sep 17 00:00:00 2001 From: Simon Cooper Date: Fri, 17 Jan 2025 10:11:24 +0000 Subject: [PATCH 3/4] Javadoc is very specific about headers --- .../java/org/elasticsearch/features/package-info.java | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/server/src/main/java/org/elasticsearch/features/package-info.java b/server/src/main/java/org/elasticsearch/features/package-info.java index d6b0a0be40bdf..35707033629cb 100644 --- a/server/src/main/java/org/elasticsearch/features/package-info.java +++ b/server/src/main/java/org/elasticsearch/features/package-info.java @@ -21,7 +21,7 @@ * * * - *

Functionality

+ *

Functionality

* This functionality starts with {@link org.elasticsearch.features.NodeFeature}. This is a single id representing * new or a change in functionality - exactly what functionality that feature represents is up to the developer. These are expected * to be {@code public static final} variables on a relevant class. Each area of code then exposes their features @@ -42,12 +42,12 @@ * node feature information, and cached in the {@link org.elasticsearch.cluster.ClusterFeatures} object. * Henceforth, all cluster feature checks are fast hash set lookups, at least until the nodes or master changes. * - *

Features test infrastructure

+ *

Features test infrastructure

* Features can be specified as conditions in YAML tests, as well as checks and conditions in code-defined rolling upgrade tests * (see the Elasticsearch development documentation for more information). * These checks are performed by the {@code TestFeatureService} interface, and its standard implementation {@code ESRestTestFeatureService}. * - *

Test features

+ *

Test features

* Sometimes, you want to define a feature for nodes, but the only checks you need to do are as part of a test. In this case, * the feature doesn't need to be included in the production feature set, it only needs to be present for automated tests. * So alongside {@link org.elasticsearch.features.FeatureSpecification#getFeatures}, there is @@ -59,7 +59,7 @@ * Test features can be removed at-will (with appropriate backports), * as there is no long-term upgrade guarantees required for clusters in automated tests. * - *

Synthetic version features

+ *

Synthetic version features

* Cluster functionality checks performed on code built from the {@code main} branch can only use features to check functionality, * but we also have branch releases with a longer release cadence. Sometimes tests need to be conditional on older versions * (where there isn't a feature already defined in the right place), determined some point after the release has been finalized. @@ -69,7 +69,7 @@ * This is done by {@code ESRestTestFeatureService}, matching on features of the form {@code gte_v8.12.3}. * For more information on their use, see the Elasticsearch developer documentation. * - *

Assumed features

+ *

Assumed features

* Once a feature is defined on a cluster, it can never be removed - this is to ensure that functionality that is available * on a cluster then never stops being available. However, this can lead to the list of features in cluster state growing ever larger. * It is possible to remove defined cluster features, but only on a compatibility boundary (normally a new major release). From ac6d919e2b23015af6b0f00b24349874d786308a Mon Sep 17 00:00:00 2001 From: Simon Cooper Date: Fri, 17 Jan 2025 13:24:17 +0000 Subject: [PATCH 4/4] Tweak wording --- .../src/main/java/org/elasticsearch/features/package-info.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/server/src/main/java/org/elasticsearch/features/package-info.java b/server/src/main/java/org/elasticsearch/features/package-info.java index 35707033629cb..94b17648814af 100644 --- a/server/src/main/java/org/elasticsearch/features/package-info.java +++ b/server/src/main/java/org/elasticsearch/features/package-info.java @@ -95,7 +95,7 @@ * will always have the new functionality. *

* So then how do we do a rolling upgrade from 8.18 to 9.0, if features have been removed? Normally, that would prevent a 9.0 - * node from joining an 8.18 cluster, as it will not have all the unnecessary features. However, we can make use + * node from joining an 8.18 cluster, as it will not have all the required features published. However, we can make use * of the major version difference to allow the rolling upgrade to proceed. *

* This is where the {@link org.elasticsearch.features.NodeFeature#assumedAfterNextCompatibilityBoundary()} field comes in. On 8.18,