Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter to prefix aggs name with type in search responses #22965

Merged
merged 7 commits into from
Feb 9, 2017

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Feb 3, 2017

This pull request adds a new parameter to the REST Search API named prefixed_name. When set to true, the aggregation names in the search response are prefixed with a prefix that reflects the internal type of the aggregation.

Here is a simple example with a terms aggregation called "tweets_per_user":

GET /_search?prefixed_name
{
    "aggs": {
        "tweets_per_user": {
            "terms": {
                "field": "user"
            }
        }
    },
    "size": 0
}

In, the response, the aggregation appears with the name "sterms:tweets_per_user":

{
    "aggs": {
        "sterms:tweets_per_user": {
            ...
        }
    }
}

This parameter is intended to make life easier for REST clients that could parse back the prefix and detect the type of the aggregation to parse. It could also be implemented for suggesters.

Note that the name is changed when the XContent is printed out, it should be an issue for derivative aggregations that use bucket_paths. Also, name suggestion for prefixed_name as well as feedback on test coverage are warmly welcome.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot @tlrx I left a few comments

* determine the internal type of the aggregation.
*/
protected String getPrefixedName() {
return String.join(":", getWriteableName(), getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use # as a separator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

* identifies its type. This prefix can later be used by REST clients to
* determine the internal type of the aggregation.
*/
protected String getPrefixedName() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather define a protected method that only returns the prefix (getType ?), whose default return value is getWriteableName(). Otherwise subclasses can potentially override the separator used (or even whether they include the name of the agg or not).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

"prefixed_name": {
"type" : "boolean",
"description" : "Specify whether aggregation names should be prefixed by their respective types in the response"
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also multi_search accept the param? also the corresponding template apis?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense that msearch also has this parameter so I updated the code.

For search template APIs, I don't know. I guess it makes sense, but I'd like to keep this PR short - could we deal with this when we'll add parsing methods for Search Template APIs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should address search templates too in this PR, it's only two more lines that make the param accepted if I am not missing anything. SearchTemplateResponse embeds SearchResponse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok!

// TESTRESPONSE[s/\.\.\./"took": "$body.took", "timed_out": false, "_shards": "$body._shards", "hits": "$body.hits"/]

<1> The name `tweets_over_time` now contains the `date_histogram` prefix.
<2> The name `top_users` now contains the `top_hits` prefix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we document that in certain cases the returned type is not exactly the type that was provided in the request? I think this happens only for terms and significant terms.

Also, I wonder if we should avoid returning umterms and umsigterms, as whether the field is unmapped should not be relevant, rather return sterms and ssigterms for both. We could also potentially share the toXContet and parsing code in UnmappedTerms and UnmappedSignificantTerms with StringTerms and SignificantStringTerms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we document that in certain cases the returned type is not exactly the type that was provided in the request? I think this happens only for terms and significant terms.

I also found InternalSampler and MappedSampler, as well as percentile aggregations. I added a note in the document, let me know what you think.

@javanna javanna changed the title Add parameter to prefix aggs name by type in search responses Add parameter to prefix aggs name with type in search responses Feb 4, 2017
@javanna
Copy link
Member

javanna commented Feb 4, 2017

@clintongormley can you help us with naming here please?

@clintongormley
Copy link

Given that this parameter is almost always going to be used by a client rather than by a user, I think it can be quite specific and doesn't need to be short, eg include_agg_type, or name_aggs_with_type

@javanna
Copy link
Member

javanna commented Feb 5, 2017

@clintongormley we will reuse this parameter also for suggestions, so we would like to make its name reflect that and not include any reference to aggregations. That is the biggest naming challenge I think, do you have any idea? :)

@clintongormley
Copy link

typed_responses or typed_keys?

@tlrx
Copy link
Member Author

tlrx commented Feb 6, 2017

I like typed_keys!

@tlrx
Copy link
Member Author

tlrx commented Feb 6, 2017

Thanks @javanna for your comments. I updated the PR and added typed_keys support for both multi search, search template and multi search template APIs. I also change the returned type for some aggregations. I think this is getting close!

@Override
public final XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
// Concatenates the type and the name of the aggregation (ex: top_hits#foo)
String name = params.paramAsBoolean("typed_keys", false) ? String.join("#", getType(), getName()) : getName();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to acces the typed_keys constant from somewhere? If no, can we make it a publicly accessible constant somewhere so we can reuse it when rendering suggestion types?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it would be good if the separator was a constant somewhere so we can refer to it when splitting the type/name pair while parsing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we usually repeat the parameter name instead of using a public constant?

Will do for the delimiter since it will be used by subclasses to parse the XContent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a constant somewhere for # makes sense. It may make sense to also add a constant for typed_keys, but I don't have a strong opinion on that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a constant for the delimiter directly in InternalAggregation class, we'll see where to move when we'll use it for suggester. For typed_keys, I'm not sure we want to plumb a constant in aggregation implementation and in the REST layer... maybe it could be in TXContentParams itself, I don't know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use the key string in RestMultiSearchAction and RestSearchAction and its use is somewhat related to the search API, maybe we can add it with some explanation in one of those (e.g. RestSearchAction)? Just my 2 cents though.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a comment on docs, LGTM otherwise, thanks a lot @tlrx

<1> The name `tweets_over_time` now contains the `date_histogram` prefix.
<2> The name `top_users` now contains the `top_hits` prefix.

NOTE: For some aggregations, it is possible that the returned type in the response does not exactly match type from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the returned type is not the same as the one provided with the request. This is the case for..... where the returned type also contains information about the type of the targeted field:

s/PErcentiles/Percentiles

I think that we listed them all, can we remove the etc. at the end of the sentence?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks!

@Override
public final XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
// Concatenates the type and the name of the aggregation (ex: top_hits#foo)
String name = params.paramAsBoolean("typed_keys", false) ? String.join("#", getType(), getName()) : getName();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a constant somewhere for # makes sense. It may make sense to also add a constant for typed_keys, but I don't have a strong opinion on that.

This pull request adds a new parameter to the REST Search API named `prefixed_name`. When set to true, the aggregation names in the search response will be prefixed with a prefix that reflects the internal type of the aggregation.

Here is a simple example:
```
GET /_search?prefixed_name
{
    "aggs": {
        "tweets_per_user": {
            "terms": {
                "field": "user"
            }
        }
    },
    "size": 0
}
```

And the response:

```
{
    "aggs": {
        "sterms:tweets_per_user": {
            ...
        }
    }
}
```

This parameter is intended to make life easier for REST clients that could parse back the prefix and could detect the type of the aggregation to parse. It could also be implemented for suggesters.

Name suggestion for `prefixed_name`, as well as feedback on test coverage.
@tlrx tlrx force-pushed the prefix-aggregations-names-by-types branch from b293db4 to e0a53ed Compare February 8, 2017 14:35
@tlrx tlrx merged commit 3553522 into elastic:master Feb 9, 2017
@tlrx
Copy link
Member Author

tlrx commented Feb 9, 2017

Thanks @javanna @cbuescher !

tlrx added a commit that referenced this pull request Feb 9, 2017
This pull request adds a new parameter to the REST Search API named `typed_keys`. When set to true, the aggregation names in the search response will be prefixed with a prefix that reflects the internal type of the aggregation.

Here is a simple example:
```
GET /_search?typed_keys
{
    "aggs": {
        "tweets_per_user": {
            "terms": {
                "field": "user"
            }
        }
    },
    "size": 0
}
```

And the response:

```
{
    "aggs": {
        "sterms:tweets_per_user": {
            ...
        }
    }
}
```

This parameter is intended to make life easier for REST clients that could parse back the prefix and could detect the type of the aggregation to parse. It could also be implemented for suggesters.
@tlrx tlrx deleted the prefix-aggregations-names-by-types branch February 9, 2017 11:38
tlrx added a commit that referenced this pull request Feb 10, 2017
…h responses (#23080)

This pull request reuses the typed_keys parameter added in #22965, but this time it applies it to suggesters. When set to true, the suggester names in the search response will be prefixed with a prefix that reflects their type.
tlrx added a commit that referenced this pull request Feb 10, 2017
…h responses (#23080)

Thiscommit reuses the typed_keys parameter added in #22965, but this time it applies it to suggesters. When set to true, the suggester names in the search response will be prefixed with a prefix that reflects their type.
@coding-career
Copy link

coding-career commented Aug 16, 2018

I found it's a fixed value with "true" for RestSearchAction.TYPED_KEYS_PARAM
in org.elasticsearch.client.Request.search ,so i have to use it with "true". Is it a good idea?

@javanna
Copy link
Member

javanna commented Aug 16, 2018

@boGitt yes the high-level client sets it to true and that is the only way that it can parse a search response properly. The prefix is used to identify which class aggregation and suggestion responses need to be parsed into. This is harmless and should be transparent to users when the high-level client is used as the name of the aggregation that is retrieved should not contain the prefix. Are you experiencing problems with this?

@coding-career
Copy link

yeah,i have got a aggregation response with prefix and some is "sterms", some is "lterms", or "filter",but it's "false" by default in curl response. So i have to reset the name before passing it to others, it's so unfriendly.

@javanna
Copy link
Member

javanna commented Aug 16, 2018

h @boGitt I am afraid this has to do with how you are accessing the search response, most likely you are retrieving the search response using the high-level rest client and then printing it out as json. Can you confirm please? In that case I wonder why you need to use the high-level REST client, you can use the low-level one that gives you back the json, without having to set that parameter. Thoughts?

@coding-career
Copy link

You are right , i'm trying get a json string with it, maybe i was in a mistake of the useage for high-level rest client. Thanks. I will change it to low-level, and i can user bucket_script at the same time.
By the way, to provide a way to change it customizable will be a good idea?

@javanna
Copy link
Member

javanna commented Aug 16, 2018

I don't think it makes a lot of sense to customize it on the request side, because the high-level REST client does not work without. We could potentially allow to not print it out when calling toXContent , but that tends to hide problems, meaning users that have this problem are most likely using the high-level REST client but could use the low-level one instead. Thoughts @nik9000 @hub-cap ?

@hub-cap
Copy link
Contributor

hub-cap commented Sep 17, 2018

Sorry, i totally missed this. I agree wo @javanna as this is something that can be accomplished with only the low level client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants