[ML] Add _cat/ml/anomaly_detectors API #51364

benwtrent · 2020-01-23T17:41:42Z

Adds new _cat/ml/anomaly_detectors and _cat/ml/anomaly_detectors/{job_id} endpoints

Example output:

GET /_cat/ml/anomaly_detectors?v
>
id                    state data.processed_records model.bytes model.memory_status forecast.total bucket.count
high_sum_total_sales closed 4674                   1.5mb                        ok 0              743
low_request_rate     closed 1216                   40.5kb                       ok 0              1457
response_code_rates  closed 14073                  132.7kb                      ok 0              1460
url_scanning         closed 14073                  501.7kb                      ok 0              1460

Same call but sorted

GET /_cat/ml/anomaly_detectors?v&s=dpr:desc
>
id                    state data.processed_records model.bytes model.memory_status forecast.total bucket.count
response_code_rates  closed 14073                  132.7kb                      ok 0              1460
url_scanning         closed 14073                  501.7kb                      ok 0              1460
high_sum_total_sales closed 4674                   1.5mb                        ok 0              743
low_request_rate     closed 1216                   40.5kb                       ok 0              1457

For specific jobs and only specific fields

GET /_cat/ml/anomaly_detectors/*rate*?v&h=id,dpr,mb
>
id                  dpr   mb
low_request_rate    1216  40.5kb
response_code_rates 28146 132.7kb

All other typical settings are supported as well.

Help output

id                               |                                    | the job_id                                                       
state                            | s                                  | the job state                                                    
opened_time                      | ot                                 | the amount of time the job has been opened                       
assignment_explanation           | ae                                 | why the job is or is not assigned to a node                      
data.processed_records           | dpr,dataProcessedRecords           | number of processed records                                      
data.processed_fields            | dpr,dataProcessedFields            | number of processed fields                                       
data.input_bytes                 | dib,dataInputBytes                 | total input bytes                                                
data.input_records               | dir,dataInputRecords               | total record count                                               
data.input_fields                | dif,dataInputFields                | total field count                                                
data.invalid_dates               | did,dataInvalidDates               | number of records with invalid dates                             
data.missing_fields              | dmf,dataMissingFields              | number of records with missing fields                            
data.out_of_order_timestamps     | doot,dataOutOfOrderTimestamps      | number of records handled out of order                           
data.empty_buckets               | deb,dataEmptyBuckes                | number of empty buckets                                          
data.sparse_buckets              | dsb,dataSparseBuckets              | number of sparse buckets                                         
data.buckets                     | db,dataBuckes                      | total bucket count                                               
data.earliest_record             | der,dataEarliestRecord             | earliest record time                                             
data.latest_record               | dlr,dataLatestRecord               | latest record time                                               
data.last                        | dl,dataLast                        | last time data was seen                                          
data.last_empty_bucket           | dleb,dataLastEmptyBucket           | last time an empty bucket occurred                               
data.last_sparse_bucket          | dlsb,dataLastSparseBucket          | last time a sparse bucket occurred                               
model.bytes                      | mb,modelBytes                      | model size                                                       
model.memory_status              | mms,modelMemoryStatus              | current memory status                                            
model.bytes_exceeded             | mbe,modelBytesExceeded             | how much the model has exceeded the limit                        
model.memory_limit               | mml,modelMemoryLimit               | model memory limit                                               
model.by_fields                  | mbf,modelByFields                  | count of 'by' fields                                             
model.over_fields                | mof,modelOverFields                | count of 'over' fields                                           
model.partition_fields           | mpf,modelPartitionFields           | count of 'partition' fields                                      
model.bucket_allocation_failures | mbaf,modelBucketAllocationFailures | number of bucket allocation failures                             
model.log_time                   | mlt,modelLogTime                   | when the model stats were gathered                               
model.timestamp                  | mt,modelTimestamp                  | the time of the last record when the model stats were gathered   
forecast.total                   | ft,forecastTotal                   | total number of forecasts                                        
forecast.memory.min              | fmmin,forecastMemoryMin            | minimum memory used by forecasts                                 
forecast.memory.max              | fmmax,forecastsMemoryMax           | maximum memory used by forecasts                                 
forecast.memory.avg              | fmavg,forecastMemoryAvg            | average memory used by forecasts                                 
forecast.memory.total            | fmt,forecastMemoryTotal            | total memory used by all forecasts                               
forecast.records.min             | frmin,forecastRecordsMin           | minimum record count for forecasts                               
forecast.records.max             | frmax,forecastRecordsMax           | maximum record count for forecasts                               
forecast.records.avg             | fravg,forecastRecordsAvg           | average record count for forecasts                               
forecast.records.total           | frt,forecastRecordsTotal           | total record count for all forecasts                             
forecast.time.min                | ftmin,forecastTimeMin              | minimum runtime for forecasts                                    
forecast.time.max                | ftmax,forecastTimeMax              | maximum run time for forecasts                                   
forecast.time.avg                | ftavg,forecastTimeAvg              | average runtime for all forecasts (milliseconds)                 
forecast.time.total              | ftt,forecastTimeTotal              | total runtime for all forecasts                                  
node.id                          | ni,nodeId                          | id of the assigned node                                          
node.name                        | nn,nodeName                        | name of the assigned node                                        
node.ephemeral_id                | ne,nodeEphemeralId                 | ephemeral id of the assigned node                                
node.address                     | na,nodeAddress                     | network address of the assigned node                             
bucket.count                     | bc,bucketCount                     | bucket count                                                     
bucket.time.total                | btt,bucketTimeTotal                | total bucket processing time                                     
bucket.time.min                  | btmin,bucketTimeMin                | minimum bucket processing time                                   
bucket.time.max                  | btmax,bucketTimeMax                | maximum bucket processing time                                   
bucket.time.exp_avg              | btea,bucketTimeExpAvg              | exponential average bucket processing time (milliseconds)        
bucket.time.exp_avg_hour         | bteah,bucketTimeExpAvgHour         | exponential average bucket processing time by hour (milliseconds)

elasticmachine · 2020-01-23T17:42:07Z

Pinging @elastic/ml-core (:ml)

hendrikmuhs

LGTM, 2 typos and a nit

hendrikmuhs · 2020-01-23T18:58:29Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/rest/cat/RestCatJobStatsAction.java

+                .build());
+        table.addCell("data.buckets",
+            TableColumnAttributeBuilder.builder("total bucket count", false)
+                .setAliases("db", "dataBuckes")


dataBuckes: missing "t"

hendrikmuhs · 2020-01-23T19:00:00Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/rest/cat/RestCatJobStatsAction.java

+                .build());
+        table.addCell("data.empty_buckets",
+            TableColumnAttributeBuilder.builder("number of empty buckets", false)
+                .setAliases("deb", "dataEmptyBuckes")


dataBuckes: missing "t"

hendrikmuhs · 2020-01-23T19:07:02Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/rest/cat/RestCatJobStatsAction.java

+            table.addCell(modelSizeStats == null ? null : modelSizeStats.getTimestamp());
+
+            ForecastStats forecastStats = job.getForecastStats();
+            table.addCell(forecastStats == null ? null : forecastStats.getTotal());


nit: a bool to avoid repeating forecastStats == null || forecastStats.getTotal() <= 0L for every cell?

For sure :). I was in fast editing mode (yy 7p) and tons of line duplications.

sophiec20 · 2020-01-24T09:45:38Z

I'm a great fan of using the _cat lists to grab quick info on indices, snapshots etc. With that in mind, I would propose we simplify the call and remove the _stats appendage.

In the same way that GET _cat/indices draws down info from GET index/_stats and GET index/_settings and wherever index health lives .. it seems more in keeping that the list of jobs (as pasted above) could be called using a simpler command i.e. GET _cat/ml/anomaly_detectors ..

The column selection seems good for the anomaly detection job list.

As we extend this to other ML components, there is great potential here to simplify tasks in Dev Tools when working with transforms -> data frame analytics -> trained models -> inference ... answering quick questions such as "What trained models do I have?".

Will this be available in Dev Tools auto complete?

benwtrent · 2020-01-24T12:13:52Z

Will this be available in Dev Tools auto complete?

It can be. I am not sure how automatic that process is.

hendrikmuhs · 2020-01-24T12:58:35Z

Will this be available in Dev Tools auto complete?

It can be. I am not sure how automatic that process is.

Best to my knowledge: Completions are done via some script that runs over the REST specs. I do not know how often this is done. Afaik its not fully automated but executed manually on a regular basis.

That means, if you do nothing completions will be eventually there (given REST specs are in place).[*] To speed up the process you can open an issue, however it means extra work for both sides.

[*] Happened to me for 1 feature: I did not explicitly requested autocomplete and was happy to see it without me doing anything

* [ML] Add _cat/ml/anomaly_detectors/_stats * addressing PR feedback

[ML] Add _cat/ml/anomaly_detectors API (#51364)

[ML] Add _cat/ml/anomaly_detectors/_stats

ce34c32

benwtrent added :ml Machine learning v7.7.0 v8.0.0 labels Jan 23, 2020

benwtrent added the >enhancement label Jan 23, 2020

benwtrent changed the title ~~[ML] Add _cat/ml/anomaly_detectors/_stats~~ [ML] Add _cat/ml/anomaly_detectors/_stats API Jan 23, 2020

hendrikmuhs approved these changes Jan 23, 2020

View reviewed changes

addressing PR feedback

9eaa8e8

benwtrent changed the title ~~[ML] Add _cat/ml/anomaly_detectors/_stats API~~ [ML] Add _cat/ml/anomaly_detectors API Jan 24, 2020

benwtrent merged commit a25f922 into elastic:master Jan 24, 2020

benwtrent deleted the feature/ml-_cat-jobs branch January 24, 2020 13:20

benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jan 24, 2020

[ML] Add _cat/ml/anomaly_detectors API (elastic#51364)

873bd31

* [ML] Add _cat/ml/anomaly_detectors/_stats * addressing PR feedback

benwtrent added a commit that referenced this pull request Jan 24, 2020

[7.x] [ML] Add _cat/ml/anomaly_detectors API (#51364) (#51408)

bf53ca3

[ML] Add _cat/ml/anomaly_detectors API (#51364)

codebrain mentioned this pull request Apr 1, 2020

7.7.0 meta ticket (Part 2) elastic/elasticsearch-net#4533

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Add _cat/ml/anomaly_detectors API #51364

[ML] Add _cat/ml/anomaly_detectors API #51364

benwtrent commented Jan 23, 2020 •

edited

Loading

elasticmachine commented Jan 23, 2020

hendrikmuhs left a comment

hendrikmuhs Jan 23, 2020

hendrikmuhs Jan 23, 2020

hendrikmuhs Jan 23, 2020

benwtrent Jan 23, 2020

sophiec20 commented Jan 24, 2020

benwtrent commented Jan 24, 2020

hendrikmuhs commented Jan 24, 2020

[ML] Add _cat/ml/anomaly_detectors API #51364

[ML] Add _cat/ml/anomaly_detectors API #51364

Conversation

benwtrent commented Jan 23, 2020 • edited Loading

elasticmachine commented Jan 23, 2020

hendrikmuhs left a comment

Choose a reason for hiding this comment

hendrikmuhs Jan 23, 2020

Choose a reason for hiding this comment

hendrikmuhs Jan 23, 2020

Choose a reason for hiding this comment

hendrikmuhs Jan 23, 2020

Choose a reason for hiding this comment

benwtrent Jan 23, 2020

Choose a reason for hiding this comment

sophiec20 commented Jan 24, 2020

benwtrent commented Jan 24, 2020

hendrikmuhs commented Jan 24, 2020

benwtrent commented Jan 23, 2020 •

edited

Loading