-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parameter to prefix aggs name with type in search responses #22965
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot @tlrx I left a few comments
* determine the internal type of the aggregation. | ||
*/ | ||
protected String getPrefixedName() { | ||
return String.join(":", getWriteableName(), getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use #
as a separator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
* identifies its type. This prefix can later be used by REST clients to | ||
* determine the internal type of the aggregation. | ||
*/ | ||
protected String getPrefixedName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather define a protected method that only returns the prefix (getType
?), whose default return value is getWriteableName()
. Otherwise subclasses can potentially override the separator used (or even whether they include the name of the agg or not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!
"prefixed_name": { | ||
"type" : "boolean", | ||
"description" : "Specify whether aggregation names should be prefixed by their respective types in the response" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should also multi_search accept the param? also the corresponding template apis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense that msearch also has this parameter so I updated the code.
For search template APIs, I don't know. I guess it makes sense, but I'd like to keep this PR short - could we deal with this when we'll add parsing methods for Search Template APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should address search templates too in this PR, it's only two more lines that make the param accepted if I am not missing anything. SearchTemplateResponse
embeds SearchResponse
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
// TESTRESPONSE[s/\.\.\./"took": "$body.took", "timed_out": false, "_shards": "$body._shards", "hits": "$body.hits"/] | ||
|
||
<1> The name `tweets_over_time` now contains the `date_histogram` prefix. | ||
<2> The name `top_users` now contains the `top_hits` prefix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we document that in certain cases the returned type is not exactly the type that was provided in the request? I think this happens only for terms and significant terms.
Also, I wonder if we should avoid returning umterms and umsigterms, as whether the field is unmapped should not be relevant, rather return sterms and ssigterms for both. We could also potentially share the toXContet and parsing code in UnmappedTerms
and UnmappedSignificantTerms
with StringTerms
and SignificantStringTerms
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we document that in certain cases the returned type is not exactly the type that was provided in the request? I think this happens only for terms and significant terms.
I also found InternalSampler and MappedSampler, as well as percentile aggregations. I added a note in the document, let me know what you think.
@clintongormley can you help us with naming here please? |
Given that this parameter is almost always going to be used by a client rather than by a user, I think it can be quite specific and doesn't need to be short, eg |
@clintongormley we will reuse this parameter also for suggestions, so we would like to make its name reflect that and not include any reference to aggregations. That is the biggest naming challenge I think, do you have any idea? :) |
|
I like |
Thanks @javanna for your comments. I updated the PR and added typed_keys support for both multi search, search template and multi search template APIs. I also change the returned type for some aggregations. I think this is getting close! |
@Override | ||
public final XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException { | ||
// Concatenates the type and the name of the aggregation (ex: top_hits#foo) | ||
String name = params.paramAsBoolean("typed_keys", false) ? String.join("#", getType(), getName()) : getName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to acces the typed_keys
constant from somewhere? If no, can we make it a publicly accessible constant somewhere so we can reuse it when rendering suggestion types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it would be good if the separator was a constant somewhere so we can refer to it when splitting the type/name pair while parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we usually repeat the parameter name instead of using a public constant?
Will do for the delimiter since it will be used by subclasses to parse the XContent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a constant somewhere for # makes sense. It may make sense to also add a constant for typed_keys, but I don't have a strong opinion on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a constant for the delimiter directly in InternalAggregation class, we'll see where to move when we'll use it for suggester. For typed_keys, I'm not sure we want to plumb a constant in aggregation implementation and in the REST layer... maybe it could be in TXContentParams itself, I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we use the key string in RestMultiSearchAction and RestSearchAction and its use is somewhat related to the search API, maybe we can add it with some explanation in one of those (e.g. RestSearchAction)? Just my 2 cents though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a comment on docs, LGTM otherwise, thanks a lot @tlrx
<1> The name `tweets_over_time` now contains the `date_histogram` prefix. | ||
<2> The name `top_users` now contains the `top_hits` prefix. | ||
|
||
NOTE: For some aggregations, it is possible that the returned type in the response does not exactly match type from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the returned type is not the same as the one provided with the request. This is the case for..... where the returned type also contains information about the type of the targeted field:
s/PErcentiles/Percentiles
I think that we listed them all, can we remove the etc. at the end of the sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks!
@Override | ||
public final XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException { | ||
// Concatenates the type and the name of the aggregation (ex: top_hits#foo) | ||
String name = params.paramAsBoolean("typed_keys", false) ? String.join("#", getType(), getName()) : getName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a constant somewhere for # makes sense. It may make sense to also add a constant for typed_keys, but I don't have a strong opinion on that.
This pull request adds a new parameter to the REST Search API named `prefixed_name`. When set to true, the aggregation names in the search response will be prefixed with a prefix that reflects the internal type of the aggregation. Here is a simple example: ``` GET /_search?prefixed_name { "aggs": { "tweets_per_user": { "terms": { "field": "user" } } }, "size": 0 } ``` And the response: ``` { "aggs": { "sterms:tweets_per_user": { ... } } } ``` This parameter is intended to make life easier for REST clients that could parse back the prefix and could detect the type of the aggregation to parse. It could also be implemented for suggesters. Name suggestion for `prefixed_name`, as well as feedback on test coverage.
b293db4
to
e0a53ed
Compare
Thanks @javanna @cbuescher ! |
This pull request adds a new parameter to the REST Search API named `typed_keys`. When set to true, the aggregation names in the search response will be prefixed with a prefix that reflects the internal type of the aggregation. Here is a simple example: ``` GET /_search?typed_keys { "aggs": { "tweets_per_user": { "terms": { "field": "user" } } }, "size": 0 } ``` And the response: ``` { "aggs": { "sterms:tweets_per_user": { ... } } } ``` This parameter is intended to make life easier for REST clients that could parse back the prefix and could detect the type of the aggregation to parse. It could also be implemented for suggesters.
I found it's a fixed value with "true" for RestSearchAction.TYPED_KEYS_PARAM |
@boGitt yes the high-level client sets it to true and that is the only way that it can parse a search response properly. The prefix is used to identify which class aggregation and suggestion responses need to be parsed into. This is harmless and should be transparent to users when the high-level client is used as the name of the aggregation that is retrieved should not contain the prefix. Are you experiencing problems with this? |
yeah,i have got a aggregation response with prefix and some is "sterms", some is "lterms", or "filter",but it's "false" by default in curl response. So i have to reset the name before passing it to others, it's so unfriendly. |
h @boGitt I am afraid this has to do with how you are accessing the search response, most likely you are retrieving the search response using the high-level rest client and then printing it out as json. Can you confirm please? In that case I wonder why you need to use the high-level REST client, you can use the low-level one that gives you back the json, without having to set that parameter. Thoughts? |
You are right , i'm trying get a json string with it, maybe i was in a mistake of the useage for high-level rest client. Thanks. I will change it to low-level, and i can user bucket_script at the same time. |
I don't think it makes a lot of sense to customize it on the request side, because the high-level REST client does not work without. We could potentially allow to not print it out when calling toXContent , but that tends to hide problems, meaning users that have this problem are most likely using the high-level REST client but could use the low-level one instead. Thoughts @nik9000 @hub-cap ? |
Sorry, i totally missed this. I agree wo @javanna as this is something that can be accomplished with only the low level client. |
This pull request adds a new parameter to the REST Search API named
prefixed_name
. When set to true, the aggregation names in the search response are prefixed with a prefix that reflects the internal type of the aggregation.Here is a simple example with a
terms
aggregation called "tweets_per_user":In, the response, the aggregation appears with the name "sterms:tweets_per_user":
This parameter is intended to make life easier for REST clients that could parse back the prefix and detect the type of the aggregation to parse. It could also be implemented for suggesters.
Note that the name is changed when the XContent is printed out, it should be an issue for derivative aggregations that use bucket_paths. Also, name suggestion for
prefixed_name
as well as feedback on test coverage are warmly welcome.