Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create GET _cat/transforms API Issue #53643

Merged
merged 5 commits into from
Mar 18, 2020
Merged

Conversation

zacharymorn
Copy link
Contributor

@zacharymorn zacharymorn commented Mar 17, 2020

Adds new _cat/transform and _cat/transform/{transform_id} endpoints.

Testing Done:

CAT single transform

curl -XGET localhost:9200/_cat/transform/airline-transform-stats-1 -H 'Content-Type: application/json' -u elastic-admin:elastic-password` 
airline-transform-stats-1 2020-03-17T02:42:37.418Z 8.0.0 airline-data airline-data-by-airline-1  batch one-time STOPPED

CAT all transforms

curl -XGET localhost:9200/_cat/transform -H 'Content-Type: application/json' -u elastic-admin:elastic-password
airline-transform-stats-1 2020-03-17T02:42:37.418Z 8.0.0 airline-data airline-data-by-airline-1  batch one-time STOPPED
airline-transform-stats-2 2020-03-17T02:42:44.083Z 8.0.0 airline-data airline-data-by-airline-2  batch one-time STOPPED

Help Output

curl -XGET localhost:9200/_cat/transform?help -H 'Content-Type: application/json' -u elastic-admin:elastic-password
id                               |                            | the id                                                       
create_time                      | ct,createTime              | transform creation time                                      
version                          | v                          | the version of Elasticsearch when the transform was created  
source_index                     | si,sourceIndex             | source index                                                 
dest_index                       | di,destIndex               | destination index                                            
description                      | d                          | description                                                  
transform_type                   | sc                         | batch or continues transform                                 
frequency                        | f                          | frequency of transform                                       
state                            | s                          | transform state                                              
reason                           | r,reason                   | reason                                                       
changes_last_detection_time      | cldt                       | changes last detected time                                   
search_total                     | st                         | total number of searches                                     
search_failure                   | sf                         | total number of search failures                              
search_time                      | stime                      | search time                                                  
index_total                      | it                         | total number of indices                                      
index_failure                    | if                         | total number of index failures                               
index_time                       | itime                      | index time                                                   
document_total                   | dt                         | total number of documents                                    
invocation_total                 | itotal                     | total number of invocations                                  
page_total                       | pt                         | total number of pages                                        
checkpoint_duration_time_exp_avg | cdtea,checkpointTimeExpAvg | exponential average checkpoint processing time (milliseconds)
indexed_documents_exp_avg        | idea                       | exponential average number of documents indexed              
processed_documents_exp_avg      | pdea                       | exponential average number of documents processed            

Columns selection

curl -XGET localhost:9200/_cat/transform?h=id,version -H 'Content-Type: application/json' -u elastic-admin:elastic-password
airline-transform-stats-1 8.0.0
airline-transform-stats-2 8.0.0

Display results in json / yaml formats

JSON

curl -XGET localhost:9200/_cat/transform?format=json -H 'Content-Type: application/json' -u elastic-admin:elastic-password
[{"id":"airline-transform-stats-1","create_time":"2020-03-17T02:42:37.418Z","version":"8.0.0","source_index":"airline-data","dest_index":"airline-data-by-airline-1","description":null,"transform_type":"batch","frequency":"one-time","state":"STOPPED"},{"id":"airline-transform-stats-2","create_time":"2020-03-17T02:42:44.083Z","version":"8.0.0","source_index":"airline-data","dest_index":"airline-data-by-airline-2","description":null,"transform_type":"batch","frequency":"one-time","state":"STOPPED"}]

YAML

curl -XGET localhost:9200/_cat/transform?format=yaml -H 'Content-Type: application/json' -u elastic-admin:elastic-password
---
- id: "airline-transform-stats-1"
  create_time: "2020-03-17T02:42:37.418Z"
  version: "8.0.0"
  source_index: "airline-data"
  dest_index: "airline-data-by-airline-1"
  description: null
  transform_type: "batch"
  frequency: "one-time"
  state: "STOPPED"
- id: "airline-transform-stats-2"
  create_time: "2020-03-17T02:42:44.083Z"
  version: "8.0.0"
  source_index: "airline-data"
  dest_index: "airline-data-by-airline-2"
  description: null
  transform_type: "batch"
  frequency: "one-time"
  state: "STOPPED"

Closes #51412

@zacharymorn zacharymorn changed the title Create GET _cat/transforms API Issue #51412 Create GET _cat/transforms API Issue Mar 17, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

@benwtrent benwtrent self-requested a review March 17, 2020 11:10
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zacharymorn for taking this on :).



import org.elasticsearch.common.Strings;
package org.elasticsearch.common;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since only xpack things use this class, I think keeping it in xpack makes sense for now.

Probably in org.elasticsearch.xpack.core.common.

If things outside of xpack need this code, it can probably be moved then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I've moved it back there.

}

GetTransformAction.Request request = new GetTransformAction.Request(id);
request.setAllowNoResources(restRequest.paramAsBoolean(ALLOW_NO_MATCH.getPreferredName(), true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also allow the page params to be set. Without setting size or from there are limitations around how many transforms can be returned with this API.

Both GetTransformAction.Request and GetTransformStatsAction.Request allow paging parameters to be set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Override
protected void documentation(StringBuilder sb) {
sb.append("/_cat/transform\n");
sb.append("_cat/transform/{" + TransformField.TRANSFORM_ID + "}\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sb.append("_cat/transform/{" + TransformField.TRANSFORM_ID + "}\n");
sb.append("/_cat/transform/{" + TransformField.TRANSFORM_ID + "}\n");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.setAliases("d")
.build())
.addCell("transform_type",
TableColumnAttributeBuilder.builder("batch or continues transform")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TableColumnAttributeBuilder.builder("batch or continues transform")
TableColumnAttributeBuilder.builder("batch or continuous transform")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ops. Fixed.

.build())
.addCell("transform_type",
TableColumnAttributeBuilder.builder("batch or continues transform")
.setAliases("sc")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.setAliases("sc")
.setAliases("tt")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

.addCell(config.getDestination().getIndex())
.addCell(config.getDescription())
.addCell(config.getSyncConfig() == null ? "batch" : "continuous")
.addCell(config.getFrequency() == null ? "one-time" : config.getFrequency())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not strictly true.

If the transform runs into problems for some reason (intermittent cluster issues), it will retry from its last known position at this given frequency.

If the frequency is null, the default value is: TimeValue.timeValueMillis(60000)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I assume in the case of user not providing frequency setting, and the batch transform runs to completion successfully, we should still output TimeValue.timeValueMillis(60000) as frequency value right?

.addCell("frequency",
TableColumnAttributeBuilder.builder("frequency of transform")
.setAliases("f")
.build())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add PivotConfig#getMaxPageSearchSize(). This should be accessible in the transform config under TransformConfig#getPivotConfig.

It's default value is 500.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.addCell(checkpointingInfo == null ? null : checkpointingInfo.getChangesLastDetectedAt())
.addCell(transformIndexerStats == null ? null : transformIndexerStats.getSearchTotal())
.addCell(transformIndexerStats == null ? null : transformIndexerStats.getSearchFailures())
.addCell(transformIndexerStats == null ? null : transformIndexerStats.getSearchTime())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.addCell(transformIndexerStats == null ? null : transformIndexerStats.getSearchTime())
.addCell(transformIndexerStats == null ? null : TimeValue.timeValueMillis(transformIndexerStats.getSearchTime()))

It will be helpful to make these time related statistics TimeValue objects. That way they can take advantage of the formatting options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


.addCell(transformIndexerStats == null ? null : transformIndexerStats.getIndexTotal())
.addCell(transformIndexerStats == null ? null : transformIndexerStats.getIndexFailures())
.addCell(transformIndexerStats == null ? null : transformIndexerStats.getIndexTime())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.addCell(transformIndexerStats == null ? null : transformIndexerStats.getIndexTime())
.addCell(transformIndexerStats == null ? null : TimeValue.timeValueMillis(transformIndexerStats.getIndexTime()))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

I see that for transformIndexerStats.getExpAvgCheckpointDurationMs() below, it is returning a double object, but TimeValue#timeValueMillis only takes in long. Should I update it as well at the cost of losing precision by converting double to long, or it's fine to leave it as is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave the double as is. Just like you did :).

It might be interesting to have factional TimeValue objects in the future. So we could get human friendly output like 10.8ms :D.

.addCell("dest_index",
TableColumnAttributeBuilder.builder("destination index")
.setAliases("di", "destIndex")
.build())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipeline that the transform references should also be included. DestConfig#getPipeline().

Its nullable with no default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@zacharymorn
Copy link
Contributor Author

Updated testing with 5 transforms:

CAT all transforms

curl -XGET localhost:9200/_cat/transform -H 'Content-Type: application/json' -u elastic-admin:elastic-password
airline-transform-stats-1  2020-03-18T03:48:55.519Z 8.0.0 airline-data airline-data-by-airline-1   batch 1m 500 STOPPED
airline-transform-stats-2  2020-03-18T03:49:25.045Z 8.0.0 airline-data airline-data-by-airline-2   batch 1m 500 STOPPED
airline-transform-stats-3  2020-03-18T03:49:59.024Z 8.0.0 airline-data airline-data-by-airline-3   batch 1m 500 STOPPED
airline-transform-stats-4  2020-03-18T03:50:03.722Z 8.0.0 airline-data airline-data-by-airline-4   batch 1m 500 STOPPED
airline-transform-stats-5 2020-03-18T03:50:23.558Z 8.0.0 airline-data airline-data-by-airline-5   batch 1m 500 STOPPED

CAT with from and size params

curl -XGET localhost:9200/_cat/transform?from=1 -H 'Content-Type: application/json' -u elastic-admin:elastic-password
airline-transform-stats-2  2020-03-18T03:49:25.045Z 8.0.0 airline-data airline-data-by-airline-2   batch 1m 500 STOPPED
airline-transform-stats-3  2020-03-18T03:49:59.024Z 8.0.0 airline-data airline-data-by-airline-3   batch 1m 500 STOPPED
airline-transform-stats-4  2020-03-18T03:50:03.722Z 8.0.0 airline-data airline-data-by-airline-4   batch 1m 500 STOPPED
airline-transform-stats-5 2020-03-18T03:50:23.558Z 8.0.0 airline-data airline-data-by-airline-5   batch 1m 500 STOPPED
curl -XGET localhost:9200/_cat/transform?size=3 -H 'Content-Type: application/json' -u elastic-admin:elastic-password
airline-transform-stats-1  2020-03-18T03:48:55.519Z 8.0.0 airline-data airline-data-by-airline-1   batch 1m 500 STOPPED
airline-transform-stats-2  2020-03-18T03:49:25.045Z 8.0.0 airline-data airline-data-by-airline-2   batch 1m 500 STOPPED
airline-transform-stats-3  2020-03-18T03:49:59.024Z 8.0.0 airline-data airline-data-by-airline-3   batch 1m 500 STOPPED

Help Output

curl -XGET localhost:9200/_cat/transform?help -H 'Content-Type: application/json' -u elastic-admin:elastic-password
id                               |                            | the id                                                       
create_time                      | ct,createTime              | transform creation time                                      
version                          | v                          | the version of Elasticsearch when the transform was created  
source_index                     | si,sourceIndex             | source index                                                 
dest_index                       | di,destIndex               | destination index                                            
pipeline                         | p                          | transform pipeline                                           
description                      | d                          | description                                                  
transform_type                   | tt                         | batch or continuous transform                                
frequency                        | f                          | frequency of transform                                       
max_page_search_size             | mpsz                       | max page search size                                         
state                            | s                          | transform state                                              
reason                           | r,reason                   | reason                                                       
changes_last_detection_time      | cldt                       | changes last detected time                                   
search_total                     | st                         | total number of searches                                     
search_failure                   | sf                         | total number of search failures                              
search_time                      | stime                      | search time                                                  
index_total                      | it                         | total number of indices                                      
index_failure                    | if                         | total number of index failures                               
index_time                       | itime                      | index time                                                   
document_total                   | dt                         | total number of documents                                    
invocation_total                 | itotal                     | total number of invocations                                  
page_total                       | pt                         | total number of pages                                        
checkpoint_duration_time_exp_avg | cdtea,checkpointTimeExpAvg | exponential average checkpoint processing time (milliseconds)
indexed_documents_exp_avg        | idea                       | exponential average number of documents indexed              
processed_documents_exp_avg      | pdea                       | exponential average number of documents processed        

@benwtrent
Copy link
Member

jenkins test this please

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once CI passes, we will be good to merge.

Thanks so much for getting this done very quickly :)

@benwtrent
Copy link
Member

@elasticmachine update branch

@benwtrent
Copy link
Member

jenkins test this please

@benwtrent benwtrent merged commit 110ff6c into elastic:master Mar 18, 2020
benwtrent pushed a commit to benwtrent/elasticsearch that referenced this pull request Mar 18, 2020
Adds new` _cat/transform` and `_cat/transform/{transform_id}` endpoints.
benwtrent added a commit that referenced this pull request Mar 18, 2020
Adds new` _cat/transform` and `_cat/transform/{transform_id}` endpoints.
@zacharymorn
Copy link
Contributor Author

Once CI passes, we will be good to merge.

Thanks so much for getting this done very quickly :)

Sure no problem, my pleasure & I learnt a lot from this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create GET _cat/transforms API
5 participants