[ML] Parse and report memory usage for DF Analytics #52778

dimitris-athanasiou · 2020-02-25T16:48:05Z

Adds reporting of memory usage for data frame analytics jobs.
This commit introduces a new index pattern .ml-stats-* whose
first concrete index will be .ml-stats-000001. This index serves
to store instrumentation information for those jobs.

droberts195 · 2020-02-26T11:55:08Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/MlStatsIndex.java

+            createIndexResponse -> listener.onResponse(true),
+            error -> {
+                if (ExceptionsHelper.unwrapCause(error) instanceof ResourceAlreadyExistsException) {
+                    listener.onResponse(true);


There's an inconsistency here, because if the index and alias already existed when the method was called the response is false, but if the method is called twice concurrently such that one concurrent call creates it and the other gets a ResourceAlreadyExistsException then both return true.

If the boolean is intended to be "did this call create the index and alias" then one should return false in this case.

But it's not actually documented what the returned boolean is supposed to mean. Doing that would be good too.

droberts195 · 2020-02-26T12:00:35Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/MlStatsIndex.java

+        CreateIndexRequest createIndexRequest = client.admin()
+            .indices()
+            .prepareCreate(TEMPLATE_NAME + "-000001")
+            .addAlias(new Alias(writeAlias()).writeIndex(true))


This doesn't cover the edge case where the index exists but the write alias doesn't (presumably because a user accidentally deleted it, but maybe also due to a bug in ILM).

I think this method should cover that case like AnomalyDetectorsIndex.createStateIndexAndAliasIfNecessary() does. It will avoid support cases if the system can be self healing in this situation. At present it will return true giving the impression that everything is good when the post conditions are that the index exists but not the alias.

Good point. I thought the main reason we had that code for anomaly detection was because originally we were not using aliases. But I can see how being able to self-heal would help.

droberts195 · 2020-02-26T12:11:01Z

...k/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/stats/MemoryUsage.java

+        parser.declareString((bucket, s) -> {}, TYPE);
+        parser.declareString(ConstructingObjectParser.constructorArg(), JOB_ID);
+        parser.declareField(ConstructingObjectParser.constructorArg(),
+            p -> TimeUtils.parseTimeFieldToInstant(p, TIMESTAMP.getPreferredName()),


I don't think it's good that we're propagating the behaviour of the old Prelert time parsing that is completely non-standard in the Elastic stack into new code:

if (date.trim().length() <= 10) { // seconds return epoch * 1000; } else { return epoch; }

It would have been best if we'd removed this years ago.

Maybe now is a good opportunity to rename TimeUtils.parseTimeField() to TimeUtils.parseTimeFieldDeprecated() and TimeUtils.parseTimeFieldToInstant() to TimeUtils.parseTimeFieldToInstantDeprecated(), annotate both with @Deprecated and introduce a new method TimeUtils.parseTimeFieldToInstant() that can be used here that replaces return Instant.ofEpochMilli(dateStringToEpoch(parser.text())); with return Instant.from(DateFieldMapper.DEFAULT_DATE_TIME_FORMATTER.parse(parser.text()));.

droberts195 · 2020-02-26T12:13:09Z

...k/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/stats/MemoryUsage.java

+            builder.field(TYPE.getPreferredName(), TYPE_VALUE);
+            builder.field(JOB_ID.getPreferredName(), jobId);
+        }
+        builder.timeField(TIMESTAMP.getPreferredName(), TIMESTAMP.getPreferredName() + "_string", timestamp.toEpochMilli());


It would be good to add a comment that the reason for rounding to millisecond accuracy is that the XContent representation rounds to millisecond accuracy and it makes debugging hard if the internal accuracy is greater.

benwtrent · 2020-02-26T19:57:18Z

...k/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/stats/MemoryUsage.java

+        ConstructingObjectParser<MemoryUsage, Void> parser = new ConstructingObjectParser<>(TYPE_VALUE,
+            ignoreUnknownFields, a -> new MemoryUsage((String) a[0], (Instant) a[1], (long) a[2]));
+
+        parser.declareString((bucket, s) -> {}, TYPE);


Why is this necessary? Presumably, the only time it is parsing TYPE is when it is reading from the index. In that case, it should ignore unknown fields.

I don't think it's that bad to include it even though it's technically redundant. It serves partly as documentation that we expect the field to exist if somebody is looking at the parser definition to find out which fields are expected in the document in the current product version.

Good point!

benwtrent · 2020-02-26T20:00:02Z

...gin/core/src/test/java/org/elasticsearch/xpack/core/ml/dataframe/stats/MemoryUsageTests.java

+
+    @Override
+    protected ToXContent.Params getToXContentParams() {
+        return new ToXContent.MapParams(Collections.singletonMap(ToXContentParams.FOR_INTERNAL_STORAGE, "true"));


Suggested change

return new ToXContent.MapParams(Collections.singletonMap(ToXContentParams.FOR_INTERNAL_STORAGE, "true"));

return new ToXContent.MapParams(Collections.singletonMap(ToXContentParams.FOR_INTERNAL_STORAGE, Boolean.toString(lenient));

I think will work so that the empty parsing declaration for TYPE can go away.

The problem is we also only write out job_id for internal storage. Then we can't test for equality. Given that, I think I'd rather keep parsing TYPE.

benwtrent · 2020-02-26T20:09:57Z

...c/main/java/org/elasticsearch/xpack/ml/action/TransportGetDataFrameAnalyticsStatsAction.java

-        ));
+        AtomicInteger counter = new AtomicInteger(stoppedTasksIds.size());
+        AtomicArray<Stats> jobStats = new AtomicArray<>(stoppedTasksIds.size());
+        for (int i = 0; i < stoppedTasksIds.size(); i++) {


I wonder if we will hit scaling issues if there are 100s of stopped tasks.

Seems like we could be making 100s of unbatched, search requests.

This is why I changed the code to make a multi-search per job. If we batch everything up, then we have jobs * stats_fields searches in a single multi-search and that to have its own problems. We do it this way for anomaly detection jobs too. Not sure there's a better way.

benwtrent · 2020-02-26T20:14:49Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/utils/persistence/MlParserUtils.java

+import java.io.InputStream;
+import java.util.function.BiFunction;
+
+public final class MlParserUtils {


droberts195

LGTM

Adds reporting of memory usage for data frame analytics jobs. This commit introduces a new index pattern `.ml-stats-*` whose first concrete index will be `.ml-stats-000001`. This index serves to store instrumentation information for those jobs.

Until elastic#52778 is backported to 7.x

Relates elastic#52778

…52958) Relates #52778

Until #52778 is backported to 7.x

) Adds reporting of memory usage for data frame analytics jobs. This commit introduces a new index pattern `.ml-stats-*` whose first concrete index will be `.ml-stats-000001`. This index serves to store instrumentation information for those jobs. Backport of #52778 and #52958

Relates elastic#52778

…52981) Relates #52778

dimitris-athanasiou added >enhancement :ml Machine learning v8.0.0 v7.7.0 labels Feb 25, 2020

droberts195 reviewed Feb 26, 2020

View reviewed changes

benwtrent self-requested a review February 26, 2020 19:50

benwtrent reviewed Feb 26, 2020

View reviewed changes

droberts195 mentioned this pull request Feb 27, 2020

[ML] Make aliases on ML hidden indices into hidden aliases #52877

Closed

benwtrent approved these changes Feb 28, 2020

View reviewed changes

droberts195 approved these changes Feb 28, 2020

View reviewed changes

dimitris-athanasiou added 4 commits February 28, 2020 15:55

Address review comments about index creation

b314157

Deprecate TimeUtils methods that handle epoch seconds

1a911eb

Add comment for MemoryUsage.timestamp epoch millis rounding

3aba967

dimitris-athanasiou force-pushed the df-analytics-memory-usage branch from 788b4f7 to 3aba967 Compare February 28, 2020 14:09

dimitris-athanasiou merged commit dd33193 into elastic:master Feb 28, 2020

dimitris-athanasiou deleted the df-analytics-memory-usage branch February 28, 2020 15:35

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Feb 28, 2020

[ML] Mute data frame analytics BWC tests

83b73ff

Until elastic#52778 is backported to 7.x

dimitris-athanasiou mentioned this pull request Feb 28, 2020

[ML] Mute data frame analytics BWC tests #52954

Merged

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Feb 28, 2020

[ML] Handle unmapped_type for sort field while searching mem usage

449d527

Relates elastic#52778

dimitris-athanasiou mentioned this pull request Feb 28, 2020

[ML] Handle unmapped_type for sort field while searching mem usage #52958

Merged

dimitris-athanasiou added a commit that referenced this pull request Feb 28, 2020

[ML] Handle unmapped_type for sort field while searching mem usage (#…

1c3e09d

…52958) Relates #52778

dimitris-athanasiou added a commit that referenced this pull request Feb 29, 2020

[ML] Mute data frame analytics BWC tests (#52954)

6a47d48

Until #52778 is backported to 7.x

dimitris-athanasiou mentioned this pull request Feb 29, 2020

[7.x]ML] Parse and report memory usage for DF Analytics (#52778) #52980

Merged

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Feb 29, 2020

[ML] Unmute DF Analytics BWC tests and fix versions for memory_usage

3c22879

Relates elastic#52778

dimitris-athanasiou mentioned this pull request Feb 29, 2020

[ML] Unmute DF Analytics BWC tests and fix versions for memory_usage #52981

Merged

dimitris-athanasiou added a commit that referenced this pull request Mar 1, 2020

[ML] Unmute DF Analytics BWC tests and fix versions for memory_usage (#…

b386340

…52981) Relates #52778

droberts195 mentioned this pull request Mar 19, 2020

[ML] Start gathering and storing inference stats #53429

Merged

codebrain mentioned this pull request Apr 1, 2020

7.7.0 meta ticket (Part 2) elastic/elasticsearch-net#4533

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Parse and report memory usage for DF Analytics #52778

[ML] Parse and report memory usage for DF Analytics #52778

dimitris-athanasiou commented Feb 25, 2020

droberts195 Feb 26, 2020

droberts195 Feb 26, 2020

dimitris-athanasiou Feb 27, 2020

droberts195 Feb 26, 2020

droberts195 Feb 26, 2020

benwtrent Feb 26, 2020

droberts195 Feb 27, 2020

dimitris-athanasiou Feb 28, 2020

benwtrent Feb 26, 2020

dimitris-athanasiou Feb 28, 2020

benwtrent Feb 26, 2020

dimitris-athanasiou Feb 28, 2020

benwtrent Feb 26, 2020

droberts195 left a comment

	return new ToXContent.MapParams(Collections.singletonMap(ToXContentParams.FOR_INTERNAL_STORAGE, "true"));
	return new ToXContent.MapParams(Collections.singletonMap(ToXContentParams.FOR_INTERNAL_STORAGE, Boolean.toString(lenient));

[ML] Parse and report memory usage for DF Analytics #52778

[ML] Parse and report memory usage for DF Analytics #52778

Conversation

dimitris-athanasiou commented Feb 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

droberts195 left a comment

Choose a reason for hiding this comment