[7.16] [ML] Model snapshot upgrade needs a stats endpoint (#81706)

* [7.16] [ML] Model snapshot upgrade needs a stats endpoint Previously the ML model snapshot upgrade endpoint did not provide a way to reliably monitor progress. This could lead to the upgrade assistant UI thinking that a model snapshot upgrade had finished when it actually hadn't. This change adds a new "stats" API that allows external interested parties to find out the status of each model snapshot upgrade and which node (if any) each is running on. Backport of #81641 * Fixing compilation
elastic · Dec 14, 2021 · 0d40336 · 0d40336
1 parent f8fa41b
commit 0d40336
Show file tree

Hide file tree

Showing 16 changed files with 1,062 additions and 116 deletions.
diff --git a/...ference/ml/anomaly-detection/apis/get-job-model-snapshot-upgrade-stats.asciidoc b/...ference/ml/anomaly-detection/apis/get-job-model-snapshot-upgrade-stats.asciidoc
@@ -0,0 +1,156 @@
+[role="xpack"]
+[[ml-get-job-model-snapshot-upgrade-stats]]
+= Get {anomaly-job} model snapshot upgrade statistics API
+
+[subs="attributes"]
+++++
+<titleabbrev>Get model snapshot upgrade statistics</titleabbrev>
+++++
+
+Retrieves usage information for {anomaly-job} model snapshot upgrades.
+
+[[ml-get-job-model-snapshot-upgrade-stats-request]]
+== {api-request-title}
+
+`GET _ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>/_upgrade/_stats` +
+
+`GET _ml/anomaly_detectors/<job_id>,<job_id>/model_snapshots/_all/_upgrade/_stats` +
+
+`GET _ml/anomaly_detectors/_all/model_snapshots/_all/_upgrade/_stats`
+
+[[ml-get-job-model-snapshot-upgrade-stats-prereqs]]
+== {api-prereq-title}
+
+Requires the `monitor_ml` cluster privilege. This privilege is included in the
+`machine_learning_user` built-in role.
+
+[[ml-get-job-model-snapshot-upgrade-stats-desc]]
+== {api-description-title}
+
+{anomaly-detect-cap} job model snapshot upgrades are ephemeral. Only
+upgrades that are in progress at the time this API is called will be
+returned.
+
+[[ml-get-job-model-snapshot-upgrade-stats-path-parms]]
+== {api-path-parms-title}
+
+`<job_id>`::
+(string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection-wildcard]
+
+`<snapshot_id>`::
+(string)
+Identifier for the model snapshot.
++
+You can get statistics for multiple {anomaly-job} model snapshot upgrades in a
+single API request by using a comma-separated list of snapshot IDs. You can also
+use wildcard expressions or `_all`.
+
+[[ml-get-job-model-snapshot-upgrade-stats-query-parms]]
+== {api-query-parms-title}
+
+`allow_no_match`::
+(Optional, Boolean)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=allow-no-match-jobs]
+
+[role="child_attributes"]
+[[ml-get-job-model-snapshot-upgrade-stats-results]]
+== {api-response-body-title}
+
+The API returns an array of {anomaly-job} model snapshot upgrade status objects.
+All of these properties are informational; you cannot update their values.
+
+`assignment_explanation`::
+(string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=assignment-explanation-datafeeds]
+
+`job_id`::
+(string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
+
+`node`::
+(object)
+Contains properties for the node that runs the upgrade task. This information is
+available only for upgrade tasks that are assigned to a node.
++
+--
+[%collapsible%open]
+====
+`attributes`:::
+(object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-attributes]
+
+`ephemeral_id`:::
+(string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-ephemeral-id]
+
+`id`:::
+(string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-id]
+
+`name`:::
+(string)
+The node name. For example, `0-o0tOo`.
+
+`transport_address`:::
+(string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-transport-address]
+====
+--
+
+`snapshot_id`::
+(string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-snapshot-id]
+
+`state`::
+(string)
+One of `loading_old_state`, `saving_new_state`, `stopped` or `failed`.
+
+
+[[ml-get-job-model-snapshot-upgrade-stats-response-codes]]
+== {api-response-codes-title}
+
+`404` (Missing resources)::
+  If `allow_no_match` is `false`, this code indicates that there are no
+  resources that match the request or only partial matches for the request.
+
+[[ml-get-job-model-snapshot-upgrade-stats-example]]
+== {api-examples-title}
+
+[source,console]
+--------------------------------------------------
+GET _ml/anomaly_detectors/low_request_rate/model_snapshots/_all/_upgrade/_stats
+--------------------------------------------------
+// TEST[skip:it will be too difficult to get a reliable response in docs tests]
+
+The API returns the following results:
+
+[source,console-result]
+----
+{
+  "count" : 1,
+  "model_snapshot_upgrades" : [
+    {
+      "job_id" : "low_request_rate",
+      "snapshot_id" : "1828371",
+      "state" : "saving_new_state",
+      "node" : {
+        "id" : "7bmMXyWCRs-TuPfGJJ_yMw",
+        "name" : "node-0",
+        "ephemeral_id" : "hoXMLZB0RWKfR9UPPUCxXX",
+        "transport_address" : "127.0.0.1:9300",
+        "attributes" : {
+          "ml.machine_memory" : "17179869184",
+          "ml.max_open_jobs" : "512"
+        }
+      },
+      "assignment_explanation" : ""
+    }
+  ]
+}
+----
+// TESTRESPONSE[s/"7bmMXyWCRs-TuPfGJJ_yMw"/$body.$_path/]
+// TESTRESPONSE[s/"node-0"/$body.$_path/]
+// TESTRESPONSE[s/"hoXMLZB0RWKfR9UPPUCxXX"/$body.$_path/]
+// TESTRESPONSE[s/"127.0.0.1:9300"/$body.$_path/]
+// TESTRESPONSE[s/"17179869184"/$body.datafeeds.0.node.attributes.ml\\.machine_memory/]
diff --git a/docs/reference/ml/anomaly-detection/apis/index.asciidoc b/docs/reference/ml/anomaly-detection/apis/index.asciidoc
@@ -38,6 +38,7 @@ include::get-job.asciidoc[leveloffset=+2]
 include::get-job-stats.asciidoc[leveloffset=+2]
 include::get-ml-info.asciidoc[leveloffset=+2]
 include::get-snapshot.asciidoc[leveloffset=+2]
+include::get-job-model-snapshot-upgrade-stats.asciidoc[leveloffset=+2]
 include::get-overall-buckets.asciidoc[leveloffset=+2]
 include::get-calendar-event.asciidoc[leveloffset=+2]
 include::get-filter.asciidoc[leveloffset=+2]

diff --git a/docs/reference/ml/anomaly-detection/apis/ml-apis.asciidoc b/docs/reference/ml/anomaly-detection/apis/ml-apis.asciidoc
@@ -55,6 +55,7 @@ See also <<ml-df-analytics-apis>>.
 
 * <<ml-delete-snapshot,Delete model snapshot>>
 * <<ml-get-snapshot,Get model snapshot info>>
+* <<ml-get-job-model-snapshot-upgrade-stats,Get model snapshot upgrade statistics>>
 * <<ml-revert-snapshot,Revert model snapshot>>
 * <<ml-update-snapshot,Update model snapshot>>
 * <<ml-upgrade-job-model-snapshot,Upgrade model snapshot>>

diff --git a/rest-api-spec/src/main/resources/rest-api-spec/api/ml.get_model_snapshot_upgrade_stats.json b/rest-api-spec/src/main/resources/rest-api-spec/api/ml.get_model_snapshot_upgrade_stats.json
@@ -0,0 +1,40 @@
+{
+  "ml.get_model_snapshot_upgrade_stats":{
+    "documentation":{
+      "url":"https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-get-job-model-snapshot-upgrade-stats.html",
+      "description":"Gets stats for anomaly detection job model snapshot upgrades that are in progress."
+    },
+    "stability":"stable",
+    "visibility":"public",
+    "headers":{
+      "accept": [ "application/json"]
+    },
+    "url":{
+      "paths":[
+        {
+          "path":"/_ml/anomaly_detectors/{job_id}/model_snapshots/{snapshot_id}/_upgrade/_stats",
+          "methods":[
+            "GET"
+          ],
+          "parts":{
+            "job_id":{
+              "type":"string",
+              "description":"The ID of the job. May be a wildcard, comma separated list or `_all`."
+            },
+            "snapshot_id":{
+              "type":"string",
+              "description":"The ID of the snapshot. May be a wildcard, comma separated list or `_all`."
+            }
+          }
+        }
+      ]
+    },
+    "params":{
+      "allow_no_match":{
+        "type":"boolean",
+        "required":false,
+        "description":"Whether to ignore if a wildcard expression matches no jobs or no snapshots. (This includes the `_all` string.)"
+      }
+    }
+  }
+}
diff --git a/...lugin/core/src/main/java/org/elasticsearch/xpack/core/action/util/ExpandedIdsMatcher.java b/...lugin/core/src/main/java/org/elasticsearch/xpack/core/action/util/ExpandedIdsMatcher.java
@@ -9,7 +9,9 @@
 import org.elasticsearch.common.Strings;
 import org.elasticsearch.common.regex.Regex;
 
+import java.util.ArrayList;
 import java.util.Collection;
+import java.util.Collections;
 import java.util.Iterator;
 import java.util.LinkedList;
 import java.util.List;
@@ -43,7 +45,8 @@ public static String[] tokenizeExpression(String expression) {
         return Strings.tokenizeToStringArray(expression, ",");
     }
 
-    private final LinkedList<IdMatcher> requiredMatches;
+    private final List<IdMatcher> allMatchers;
+    private final List<IdMatcher> requiredMatches;
     private final boolean onlyExact;
 
     /**
@@ -57,15 +60,18 @@ public static String[] tokenizeExpression(String expression) {
      */
     public ExpandedIdsMatcher(String[] tokens, boolean allowNoMatchForWildcards) {
         requiredMatches = new LinkedList<>();
+        List<IdMatcher> allMatchers = new ArrayList<>();
 
         if (Strings.isAllOrWildcard(tokens)) {
             // if allowNoJobForWildcards == true then any number
             // of jobs with any id is ok. Therefore no matches
             // are required
 
+            IdMatcher matcher = new WildcardMatcher("*");
+            this.allMatchers = Collections.singletonList(matcher);
             if (allowNoMatchForWildcards == false) {
                 // require something, anything to match
-                requiredMatches.add(new WildcardMatcher("*"));
+                requiredMatches.add(matcher);
             }
             onlyExact = false;
             return;
@@ -78,23 +84,55 @@ public ExpandedIdsMatcher(String[] tokens, boolean allowNoMatchForWildcards) {
             // specific job Ids are
             for (String token : tokens) {
                 if (Regex.isSimpleMatchPattern(token)) {
+                    allMatchers.add(new WildcardMatcher(token));
                     atLeastOneWildcard = true;
                 } else {
-                    requiredMatches.add(new EqualsIdMatcher(token));
+                    IdMatcher matcher = new EqualsIdMatcher(token);
+                    allMatchers.add(matcher);
+                    requiredMatches.add(matcher);
                 }
             }
         } else {
             // Matches are required for wildcards
             for (String token : tokens) {
                 if (Regex.isSimpleMatchPattern(token)) {
-                    requiredMatches.add(new WildcardMatcher(token));
+                    IdMatcher matcher = new WildcardMatcher(token);
+                    allMatchers.add(matcher);
+                    requiredMatches.add(matcher);
                     atLeastOneWildcard = true;
                 } else {
-                    requiredMatches.add(new EqualsIdMatcher(token));
+                    IdMatcher matcher = new EqualsIdMatcher(token);
+                    allMatchers.add(matcher);
+                    requiredMatches.add(matcher);
                 }
             }
         }
         onlyExact = atLeastOneWildcard == false;
+        this.allMatchers = Collections.unmodifiableList(allMatchers);
+    }
+
+    /**
+     * Generate the list of required matches from the {@code expression}
+     * and initialize.
+     *
+     * @param expression Expression that will be tokenized into a set of wildcards or full Ids
+     * @param allowNoMatchForWildcards If true then it is not required for wildcard
+     *                                 expressions to match an Id meaning they are
+     *                                 not returned in the list of required matches
+     */
+    public ExpandedIdsMatcher(String expression, boolean allowNoMatchForWildcards) {
+        this(tokenizeExpression(expression), allowNoMatchForWildcards);
+    }
+
+    /**
+     * Test whether an ID matches any of the expressions.
+     * Unlike {@link #filterMatchedIds} this does not modify the state of
+     * the matcher.
+     * @param id ID to test.
+     * @return Does the ID match one or more of the patterns in the expression?
+     */
+    public boolean idMatches(String id) {
+        return allMatchers.stream().anyMatch(idMatcher -> idMatcher.matches(id));
     }
 
     /**

diff --git a/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/MlTasks.java b/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/MlTasks.java
@@ -305,6 +305,16 @@ public static Collection<PersistentTasksCustomMetadata.PersistentTask<?>> nonFai
         });
     }
 
+    public static Collection<PersistentTasksCustomMetadata.PersistentTask<?>> snapshotUpgradeTasks(
+        @Nullable PersistentTasksCustomMetadata tasks
+    ) {
+        if (tasks == null) {
+            return Collections.emptyList();
+        }
+
+        return tasks.findTasks(JOB_SNAPSHOT_UPGRADE_TASK_NAME, task -> true);
+    }
+
     public static Collection<PersistentTasksCustomMetadata.PersistentTask<?>> snapshotUpgradeTasksOnNode(
         @Nullable PersistentTasksCustomMetadata tasks,
         String nodeId