Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] add docs for async search #53675

Merged
merged 11 commits into from
Mar 20, 2020
Merged

Conversation

javanna
Copy link
Member

@javanna javanna commented Mar 17, 2020

Relates to #49091

@javanna javanna added >docs General docs changes :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.7.0 labels Mar 17, 2020
@javanna javanna requested review from jimczi and jrodewig March 17, 2020 15:48
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (>docs)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

path = path.replace('<', '%3C').replace('>', '%3E')
path = path.replace('{', '%7B').replace('}', '%7D')
path = path.replace('|', '%7C')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping this would work but it does cause problems. I need to see if this escaping is needed, to my mind it isn't but I may be wrong. In that case I need to only escape unless curly brackets are part of an expression like ${expression}, which I don't look forward to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out this is easy to address, there were only a couple of tests where we were using unescaped | which we can manually escape instead.

@@ -41,6 +45,15 @@
"description":"The number of shard results that should be reduced at once on the coordinating node. This value should be used as the granularity at which progress results will be made available.",
"default":5
},
"pre_filter_shard_size":{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimczi I was wondering if you left this out on purpose. It is settable hence I added it, let me know what you think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably prevent users to set this and assume that the can_match phase always runs on async_search. That means that we should throw an error if the value is set in the rest API but that's out of scope for this pr. +1 to remove it from the spec here though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a look at disallowing to set this parameter in a followup

@jrodewig
Copy link
Contributor

I left some comments to help fix wording, capitalization, and a few other things. Aside from those, this looks pretty good.

I do wonder if the API docs might be easier to parse if they were formatted similar to the API reference here:
https://github.com/elastic/docs/blob/master/shared/api-ref-ex.asciidoc

Some of the sections regarding parameters could probably benefit from that structure. However, that would likely involve duplicating a lot of content from the current search API docs, which need work themselves. I'm okay with this for now.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @javanna , I left some comments regarding the documentation of the different options.

<2> Partial aggregations results, coming from the shards that have already
completed the execution of the query.

NOTE: When results are sorted by a numeric field, shards get sorted based on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this note near to the submit query in order to explain why the example uses a sort ?

available before the timeout expires, otherwise the currently available results
will be returned once the timeout expires.

The `keep_alive` parameter, which defaults to `5d` (five days), specifies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keep_alive for the get api is a bit different since it allows to extend/change the expiration time for an existing id. Maybe good to make the distinction here since the default for get is to keep the current expiration time.

`1` second.

The submit async search API supports the same <<search-search-api-query-params,parameters>>
as the search API, though some have different default values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can list the ones that can be changed and omit the other. batched_reduce_size, request_cache and max_concurrent_shard_requests + the specific options (wait_for_completion, keep_alive) should be enough for this API. I think we said that we want to remove pre_filter_shard_size so I'd remove it from here, same for ccs_minimize_roundtrips which sounds like it could be changed but cannot.
So to be clear, a concrete list that presents all these options and then we can add a link to the search source documentation that should be unchanged from normal search request ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batched_reduce_size is also important to document since that's the granularity at which partial results will be made available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I was planning on looking back to this once we disallow setting pre_filter_shard_size and maybe others, but I can address the docs in the meantime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add a link to the search source documentation

there is already a link. Why document max_concurrent_shard_request here? The default is the same as for search?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I thought we would have the exhaustive list of options here rather than linking to the entire search request option. We only accept a subset of it so better to be explicit ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the thing is that there's a ton of options that can be set to the search request. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html#search-search-api-query-params . I thought it's better to just link to search rather than duplicate its docs, and mention the special cases or important aspects only in the async search docs. Would you rather list all of the search options also here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for a simple link but we should at least document batched_reduce_size explicitly here for the reason exposed above. We can also add the list of options that are not accepted here (scroll, ccs_minimize_roundtrips and pre_filter_shard_size) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that's what I had in mind. batched_reduce_size makes a lot of sense to explain, I already did, I will push shortly

@javanna
Copy link
Member Author

javanna commented Mar 18, 2020

thanks a lot for all the help @jrodewig ! I suspect you will have suggestions on my latest update, see bbebf3c specifically. thanks again!

@javanna
Copy link
Member Author

javanna commented Mar 18, 2020

@jimczi I pushed some updates, would you like to have another look?

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two minor comments but the change looks good to me.

==== Get async search

The get async search API retrieves the results of a previously submitted
async search request given its id. The access to the results of a specific
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's only when security is enabled that we restrict the access. We should make the distinction here.

and their partial results are also included in the response.
<2> Partial aggregations results, coming from the shards that have already
completed the execution of the query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an explanation for expiration_time_in_millis, is_partial and is_running ? The last two are important since they determine how to interpret the search response (partial or not) and if the search is still running.

@javanna javanna merged commit 0a93a93 into elastic:master Mar 20, 2020
javanna added a commit that referenced this pull request Mar 20, 2020
Relates to #49091

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :Search/Search Search-related issues that do not fall into other categories v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants