-
Notifications
You must be signed in to change notification settings - Fork 24.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Parse and report memory usage for DF Analytics #52778
[ML] Parse and report memory usage for DF Analytics #52778
Conversation
createIndexResponse -> listener.onResponse(true), | ||
error -> { | ||
if (ExceptionsHelper.unwrapCause(error) instanceof ResourceAlreadyExistsException) { | ||
listener.onResponse(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an inconsistency here, because if the index and alias already existed when the method was called the response is false
, but if the method is called twice concurrently such that one concurrent call creates it and the other gets a ResourceAlreadyExistsException
then both return true
.
If the boolean is intended to be "did this call create the index and alias" then one should return false
in this case.
But it's not actually documented what the returned boolean is supposed to mean. Doing that would be good too.
CreateIndexRequest createIndexRequest = client.admin() | ||
.indices() | ||
.prepareCreate(TEMPLATE_NAME + "-000001") | ||
.addAlias(new Alias(writeAlias()).writeIndex(true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't cover the edge case where the index exists but the write alias doesn't (presumably because a user accidentally deleted it, but maybe also due to a bug in ILM).
I think this method should cover that case like AnomalyDetectorsIndex.createStateIndexAndAliasIfNecessary()
does. It will avoid support cases if the system can be self healing in this situation. At present it will return true
giving the impression that everything is good when the post conditions are that the index exists but not the alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I thought the main reason we had that code for anomaly detection was because originally we were not using aliases. But I can see how being able to self-heal would help.
parser.declareString((bucket, s) -> {}, TYPE); | ||
parser.declareString(ConstructingObjectParser.constructorArg(), JOB_ID); | ||
parser.declareField(ConstructingObjectParser.constructorArg(), | ||
p -> TimeUtils.parseTimeFieldToInstant(p, TIMESTAMP.getPreferredName()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's good that we're propagating the behaviour of the old Prelert time parsing that is completely non-standard in the Elastic stack into new code:
if (date.trim().length() <= 10) { // seconds
return epoch * 1000;
} else {
return epoch;
}
It would have been best if we'd removed this years ago.
Maybe now is a good opportunity to rename TimeUtils.parseTimeField()
to TimeUtils.parseTimeFieldDeprecated()
and TimeUtils.parseTimeFieldToInstant()
to TimeUtils.parseTimeFieldToInstantDeprecated()
, annotate both with @Deprecated
and introduce a new method TimeUtils.parseTimeFieldToInstant()
that can be used here that replaces return Instant.ofEpochMilli(dateStringToEpoch(parser.text()));
with return Instant.from(DateFieldMapper.DEFAULT_DATE_TIME_FORMATTER.parse(parser.text()));
.
builder.field(TYPE.getPreferredName(), TYPE_VALUE); | ||
builder.field(JOB_ID.getPreferredName(), jobId); | ||
} | ||
builder.timeField(TIMESTAMP.getPreferredName(), TIMESTAMP.getPreferredName() + "_string", timestamp.toEpochMilli()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add a comment that the reason for rounding to millisecond accuracy is that the XContent representation rounds to millisecond accuracy and it makes debugging hard if the internal accuracy is greater.
ConstructingObjectParser<MemoryUsage, Void> parser = new ConstructingObjectParser<>(TYPE_VALUE, | ||
ignoreUnknownFields, a -> new MemoryUsage((String) a[0], (Instant) a[1], (long) a[2])); | ||
|
||
parser.declareString((bucket, s) -> {}, TYPE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? Presumably, the only time it is parsing TYPE
is when it is reading from the index. In that case, it should ignore unknown fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's that bad to include it even though it's technically redundant. It serves partly as documentation that we expect the field to exist if somebody is looking at the parser definition to find out which fields are expected in the document in the current product version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
|
||
@Override | ||
protected ToXContent.Params getToXContentParams() { | ||
return new ToXContent.MapParams(Collections.singletonMap(ToXContentParams.FOR_INTERNAL_STORAGE, "true")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return new ToXContent.MapParams(Collections.singletonMap(ToXContentParams.FOR_INTERNAL_STORAGE, "true")); | |
return new ToXContent.MapParams(Collections.singletonMap(ToXContentParams.FOR_INTERNAL_STORAGE, Boolean.toString(lenient)); |
I think will work so that the empty parsing declaration for TYPE
can go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is we also only write out job_id
for internal storage. Then we can't test for equality. Given that, I think I'd rather keep parsing TYPE
.
)); | ||
AtomicInteger counter = new AtomicInteger(stoppedTasksIds.size()); | ||
AtomicArray<Stats> jobStats = new AtomicArray<>(stoppedTasksIds.size()); | ||
for (int i = 0; i < stoppedTasksIds.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we will hit scaling issues if there are 100s of stopped tasks.
Seems like we could be making 100s of unbatched, search requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why I changed the code to make a multi-search per job. If we batch everything up, then we have jobs * stats_fields searches in a single multi-search and that to have its own problems. We do it this way for anomaly detection jobs too. Not sure there's a better way.
import java.io.InputStream; | ||
import java.util.function.BiFunction; | ||
|
||
public final class MlParserUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adds reporting of memory usage for data frame analytics jobs. This commit introduces a new index pattern `.ml-stats-*` whose first concrete index will be `.ml-stats-000001`. This index serves to store instrumentation information for those jobs.
788b4f7
to
3aba967
Compare
Until elastic#52778 is backported to 7.x
Until #52778 is backported to 7.x
Adds reporting of memory usage for data frame analytics jobs.
This commit introduces a new index pattern
.ml-stats-*
whosefirst concrete index will be
.ml-stats-000001
. This index servesto store instrumentation information for those jobs.