[ML] Use a new ML endpoint to estimate a model memory #60376

darnautov · 2020-03-17T14:01:44Z

Summary

Part of #60386.

Update calculate_model_memory_limit kibana endpoint to use the ML _estimate_model_memory API to retrieve a model memory estimation.
Update validate_model_memory_limit to work with new calculate_model_memory_limit
TS refactoring

Checklist

Unit or functional tests were updated or added to match the most common scenarios

…mory endpoint

elasticmachine · 2020-03-17T14:01:47Z

Pinging @elastic/ml-ui (:ml)

droberts195

The overall structure looks fine to me.

I left a few comments about how the ES endpoint is being called. (Since I don't know anything about JavaScript or TypeScript someone else needs to review from that perspective.)

x-pack/plugins/ml/server/models/calculate_model_memory_limit/calculate_model_memory_limit.ts

…ncements

droberts195

Thanks for the updates Dima. I cannot see anything wrong with this now, but please wait for a review from someone who knows the Kibana code well before merging.

x-pack/plugins/ml/server/routes/schemas/anomaly_detectors_schema.ts

x-pack/plugins/ml/server/models/job_validation/validate_model_memory_limit.test.ts

x-pack/plugins/ml/server/models/job_validation/validate_model_memory_limit.ts

x-pack/plugins/ml/server/routes/schemas/anomaly_detectors_schema.ts

peteharverson · 2020-03-18T12:01:20Z

x-pack/plugins/ml/server/models/calculate_model_memory_limit/calculate_model_memory_limit.ts

+    indexPattern: string,
+    query: any,
+    timeFieldName: string,
+    earliestMs: number,


Just a note that in the follow-up bit of work, where the modules setup endpoint calls this calculateModelMemoryLimit function, that we need to handle the case where earliest and latest times are not supplied. In this case, the setup endpoint should find the start/end times of the latest 3 months of data, and pass those times to this function.

Thanks for the heads-up, yes, I planned this changes for the follow-up PR dedicated to the setup endpoint

peteharverson · 2020-03-18T12:20:35Z

I tested the multi metric wizard, with the time range set to one where there is no data in the index, and the call to the calculate_model_memory_limit endpoint is generating an HTTP 500 interval server error:

{"statusCode":500,"error":"Internal Server Error","message":"Unable to retrieve model memory estimation"}

This looks like it happens when you add an influencer which isn't the partitioning field.

darnautov · 2020-03-18T17:07:26Z

I tested the multi metric wizard, with the time range set to one where there is no data in the index, and the call to the calculate_model_memory_limit endpoint is generating an HTTP 500 interval server error:
{"statusCode":500,"error":"Internal Server Error","message":"Unable to retrieve model memory estimation"}
This looks like it happens when you add an influencer which isn't the partitioning field.

good catch, fixed in 7bdf515

jgowdyelastic · 2020-03-18T17:23:20Z

I'm seeing errors when trying to calculate the mml when i have a max mml set in the es config.
i have set it to 128mb:

xpack.ml.max_model_memory_limit: 128mb

darnautov · 2020-03-18T17:54:07Z

I'm seeing errors when trying to calculate the mml when i have a max mml set in the es config.
i have set it to 128mb:
xpack.ml.max_model_memory_limit: 128mb

Thanks for checking this. I used ts-ignore to silent a type issue from numeral and it ignored the actual error left after refactoring. Fixed in a85a4d6

jgowdyelastic

LGTM

peteharverson · 2020-03-19T10:51:17Z

x-pack/plugins/ml/server/models/fields_service/fields_service.ts

+ * Service for carrying out queries to obtain data
+ * specific to fields in Elasticsearch indices.
+ */
+export function fieldsServiceProvider(callAsCurrentUser: APICaller) {


Building a multi-metric job against the gallery data set, using the full time range, and a small bucket span, I am seeing circuit_breaking_exceptions in the network tab of the dev console as I add extra influencers.

{"statusCode":429,"error":"Too Many Requests","message":"[circuit_breaking_exception] [parent] Data too large, data for [<reused_arrays>] would be [988102840/942.3mb], which is larger than the limit of [986932838/941.2mb], real usage: [988102768/942.3mb], new bytes reserved: [72/72b], usages [request=147475312/140.6mb, fielddata=228074/222.7kb, accounting=1282468/1.2mb, inflight_requests=784/784b], with { bytes_wanted=988102840 & bytes_limit=986932838 & durability=\"TRANSIENT\" } (and) [circuit_breaking_exception]...

We need to try and optimize the queries being done in here. There are probably two factors that are making it use a lot of memory:

It’s doing all the fields in one query

It’s using the full time range

If the bucket span is small and the time range long, then that’s a lot of time buckets e.g. 15 minute bucket span, 12 month time range.

So we probably need to be pragmatic about capping the number of buckets that we take the max over. Say last 1000 buckets. That should be easy to calculate knowing the bucket span and time range provided. Just adjust earliestMs to be max(earliestMs, latestMs - 1000 * interval) (obviously needs to be converted to ms).

As well as being defensive in the queries, you should probably also expose to the user that an error has occurred calling the estimate model memory limit endpoint e.g. in a toast notification. Currently you only see the error in the Kibana server console, or by looking in the browser network tab.

Thanks for testing! For now, I added capping of the time range as well as replace must part of the query with filter to take advantage of caching, check 7ee91e7.
In a follow-up PR, I'll try to split the current ES query into multiple (in case there are plenty of influencers provided) and check how it performs.

darnautov · 2020-03-19T13:55:45Z

@elasticmachine merge upstream

peteharverson

LGTM. Capping to a max of 1000 buckets has cleared the circuit_breaker_exceptions I was seeing on the gallery data before. Would be good to expose any errors from the endpoint to the user in the follow-up.

kibanamachine · 2020-03-19T15:42:04Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request
Commit: 6e159c4

History

💔 Build #34500 failed 9d35035
💚 Build #34273 succeeded a85a4d6
💚 Build #34245 succeeded 7bdf515
💚 Build #34235 succeeded 0ec7263
💔 Build #34116 failed b767358

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

* [ML] refactor calculate_model_memory_limit route, use estimateModelMemory endpoint * [ML] refactor validate_model_memory_limit, migrate tests to jest * [ML] fix typing issue * [ML] start estimateModelMemory url with / * [ML] fix typo, filter mlcategory * [ML] extract getCardinalities function * [ML] fields_service.ts * [ML] wip getMaxBucketCardinality * [ML] refactor and comments * [ML] fix aggs keys with special characters, fix integration tests * [ML] use pre-defined job types * [ML] fallback to 0 in case max bucket cardinality receives null * [ML] calculateModelMemoryLimit on influencers change * [ML] fix maxModelMemoryLimit * [ML] cap aggregation to max 1000 buckets * [ML] rename intervalDuration

* [ML] refactor calculate_model_memory_limit route, use estimateModelMemory endpoint * [ML] refactor validate_model_memory_limit, migrate tests to jest * [ML] fix typing issue * [ML] start estimateModelMemory url with / * [ML] fix typo, filter mlcategory * [ML] extract getCardinalities function * [ML] fields_service.ts * [ML] wip getMaxBucketCardinality * [ML] refactor and comments * [ML] fix aggs keys with special characters, fix integration tests * [ML] use pre-defined job types * [ML] fallback to 0 in case max bucket cardinality receives null * [ML] calculateModelMemoryLimit on influencers change * [ML] fix maxModelMemoryLimit * [ML] cap aggregation to max 1000 buckets * [ML] rename intervalDuration Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

* master: [ML] Use a new ML endpoint to estimate a model memory (elastic#60376) [Logs UI] Correctly update the expanded log rate table rows (elastic#60306) fixes drag and drop flakiness (elastic#60625) Removing isEmptyState from embeddable input (elastic#60511) [Cross Cluster Replication] NP Shim (elastic#60121) Clear changes when canceling an edit to an alert (elastic#60518) Update workflow syntax (elastic#60626) Updating project assigner workflows to v2.0.0 of the action and back to default tokens (elastic#60577) migrate saved objects management edition view to react/typescript/eui (elastic#59490)

darnautov added 2 commits March 16, 2020 17:22

[ML] refactor calculate_model_memory_limit route, use estimateModelMe…

ef5942a

…mory endpoint

[ML] refactor validate_model_memory_limit, migrate tests to jest

b99bfb8

darnautov added release_note:enhancement :ml v8.0.0 v7.7.0 labels Mar 17, 2020

darnautov requested review from peteharverson, droberts195 and jgowdyelastic March 17, 2020 14:01

darnautov requested a review from a team as a code owner March 17, 2020 14:01

darnautov self-assigned this Mar 17, 2020

droberts195 reviewed Mar 17, 2020

View reviewed changes

darnautov mentioned this pull request Mar 17, 2020

[ML] Enhancements to model memory estimation #60386

Closed

7 tasks

darnautov added 8 commits March 17, 2020 16:34

[ML] fix typing issue

562f837

[ML] start estimateModelMemory url with /

b663375

[ML] fix typo, filter mlcategory

c974ae2

[ML] extract getCardinalities function

57af9a2

[ML] fields_service.ts

e4ddc43

[ML] wip getMaxBucketCardinality

013ea7d

[ML] refactor and comments

956305d

Merge remote-tracking branch 'upstream/master' into ML-48510-mml-enha…

b767358

…ncements

darnautov requested a review from droberts195 March 18, 2020 10:27

droberts195 approved these changes Mar 18, 2020

View reviewed changes

jgowdyelastic reviewed Mar 18, 2020

View reviewed changes

peteharverson reviewed Mar 18, 2020

View reviewed changes

darnautov added 3 commits March 18, 2020 16:55

[ML] fix aggs keys with special characters, fix integration tests

ceb8dea

[ML] use pre-defined job types

0ec7263

[ML] fallback to 0 in case max bucket cardinality receives null

7bdf515

darnautov requested review from jgowdyelastic and peteharverson March 18, 2020 17:09

darnautov added 2 commits March 18, 2020 18:34

[ML] calculateModelMemoryLimit on influencers change

bf8fa4f

[ML] fix maxModelMemoryLimit

a85a4d6

jgowdyelastic approved these changes Mar 18, 2020

View reviewed changes

peteharverson reviewed Mar 19, 2020

View reviewed changes

darnautov added 2 commits March 19, 2020 13:19

[ML] cap aggregation to max 1000 buckets

7ee91e7

[ML] rename intervalDuration

9d35035

darnautov requested a review from peteharverson March 19, 2020 12:45

Merge branch 'master' into ML-48510-mml-enhancements

6e159c4

peteharverson approved these changes Mar 19, 2020

View reviewed changes

darnautov merged commit 7aa4651 into elastic:master Mar 19, 2020

darnautov deleted the ML-48510-mml-enhancements branch March 19, 2020 15:45

darnautov mentioned this pull request Mar 19, 2020

[7.x] [ML] Use a new ML endpoint to estimate a model memory (#60376) #60645

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Use a new ML endpoint to estimate a model memory #60376

[ML] Use a new ML endpoint to estimate a model memory #60376

darnautov commented Mar 17, 2020 •

edited

elasticmachine commented Mar 17, 2020

droberts195 left a comment

droberts195 left a comment

peteharverson Mar 18, 2020

darnautov Mar 18, 2020

peteharverson commented Mar 18, 2020

darnautov commented Mar 18, 2020

jgowdyelastic commented Mar 18, 2020

darnautov commented Mar 18, 2020

jgowdyelastic left a comment

peteharverson Mar 19, 2020

darnautov Mar 19, 2020

darnautov commented Mar 19, 2020

peteharverson left a comment

kibanamachine commented Mar 19, 2020

[ML] Use a new ML endpoint to estimate a model memory #60376

[ML] Use a new ML endpoint to estimate a model memory #60376

Conversation

darnautov commented Mar 17, 2020 • edited

Summary

Checklist

elasticmachine commented Mar 17, 2020

droberts195 left a comment

Choose a reason for hiding this comment

droberts195 left a comment

Choose a reason for hiding this comment

peteharverson Mar 18, 2020

Choose a reason for hiding this comment

darnautov Mar 18, 2020

Choose a reason for hiding this comment

peteharverson commented Mar 18, 2020

darnautov commented Mar 18, 2020

jgowdyelastic commented Mar 18, 2020

darnautov commented Mar 18, 2020

jgowdyelastic left a comment

Choose a reason for hiding this comment

peteharverson Mar 19, 2020

Choose a reason for hiding this comment

darnautov Mar 19, 2020

Choose a reason for hiding this comment

darnautov commented Mar 19, 2020

peteharverson left a comment

Choose a reason for hiding this comment

kibanamachine commented Mar 19, 2020

💚 Build Succeeded

History

darnautov commented Mar 17, 2020 •

edited