Bound the number of search results returned by elasticsearch #4026

nicktgr15 · 2013-10-31T14:36:54Z

Hello,

When making a search request (post) like the following to elasticsearch using kibana I get a java heap space error and my elasticsearch node can't recover.

{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"match_all":{}},{"bool":{"must":[{"match_all":{}}]}}]}}}},"highlight":{"fields":{},"fragment_size":2147483647,"pre_tags":["@start-highlight@"],"post_tags":["@end-highlight@"]},"size":1000000,"sort":[{"_id":{"order":"desc"}}]}

Question: Is it possible to somehow restrict the maximum value of "size" that someone can use? In general, Is it possible to bound the number of results returned by elasticsearch to avoid out of memory errors?

I don't want to increase the heap size (currently 1gb) as this would not solve the problem.

Regards,
Nick

nicktgr15 · 2013-11-01T15:35:23Z

It looks like the reason why the nodes were unable to recover was the fact that the cluster was getting into a split brain state (multiple master nodes).

In general I don't think that there is a way to limit the number of results returned by a query.

javanna · 2013-11-04T15:27:55Z

We plan to have something called "circuit breaker" that allows to prevent queries from bringing down a node if there is not enough memory. The related issue is #2929 .
In your case your size is way too high though, thus I would suggest to just lower it to a reasonable amout of documents.

clintongormley · 2014-09-05T10:32:14Z

Rather than adding a setting specifically to limit the size of the priority queue, we should aim to limit the amount of memory used by a request, and how long a request can run. This potentially allows admins to specify different policies for different users.

First step is to add the priority queue to the circuit breaker.

bobbyhubbard · 2014-09-23T18:13:41Z

Just to be clear, #5466 addresses a bug related to specifying a size above 999999 that causes significant performance degradation. The size of the index and memory consumed seem to have absolutely nothing to do with the issue. i can reproduce this bug even with an index with 1 tiny document in it... so it can't be related to loading a huge result set in memory. For example in our production ES cluster (v1.1.1):

PUT sizebugtest/nada/1
{
  "key":"value"
}
PUT sizebugtest/nada/2
{
  "key":"value2"
}

#returns both documents in 2-3ms
GET sizebugtest/nada/_search?   
#returns both documents in 3-5ms
GET sizebugtest/nada/_search?size=999999
#returns both documents in 8-25ms
GET sizebugtest/nada/_search?size=9999999
#returns both documents in 50-100ms
GET sizebugtest/nada/_search?size=99999999
#returns both documents in 7000-30,000ms!! Somestimes times out. same 2 documents!
GET sizebugtest/nada/_search?size=999999999
#400 - Awesome...no longer an int...!
GET sizebugtest/nada/_search?size=9999999999

Why the significant difference in response time for same index simply by specifying a different size? That seems like a different issue from what #4026 addresses imo.

clintongormley · 2014-09-23T18:15:44Z

@bobbyhubbard no they are related. specifying a large size (or a high from offset) means creating a large priority queue. By adding the size of the priority queue to the circuit breaker, we can abort the search if too much memory is required to service the request. That's a good generic solution instead of having a separate setting for each little part of the request.

bobbyhubbard · 2014-09-23T18:16:42Z

Ah ok. BTW - I just upgraded our dev environment to 1.3.2 to confirm if this was still an issue or not. In production running 1.1.1 I can reproduce it all day long using the test case above. However, I cannot reproduce it against 1.3.2.

clintongormley · 2015-01-30T11:39:22Z

Closing in favour of #9311

ghost assigned javanna Nov 4, 2013

javanna removed their assignment Aug 1, 2014

javanna added the discuss label Aug 1, 2014

clintongormley added adoptme and removed discuss labels Sep 5, 2014

clintongormley mentioned this issue Sep 23, 2014

Search API: Expontial performance degradation and increase std. deviation when specifying "size" param greater than 999999 #5466

Closed

clintongormley mentioned this issue Sep 30, 2014

Multi-index searches goes out of heap space if running with a big search size #7906

Closed

clintongormley closed this as completed Jan 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bound the number of search results returned by elasticsearch #4026

Bound the number of search results returned by elasticsearch #4026

nicktgr15 commented Oct 31, 2013

nicktgr15 commented Nov 1, 2013

javanna commented Nov 4, 2013

clintongormley commented Sep 5, 2014

bobbyhubbard commented Sep 23, 2014

clintongormley commented Sep 23, 2014

bobbyhubbard commented Sep 23, 2014

clintongormley commented Jan 30, 2015

Bound the number of search results returned by elasticsearch #4026

Bound the number of search results returned by elasticsearch #4026

Comments

nicktgr15 commented Oct 31, 2013

nicktgr15 commented Nov 1, 2013

javanna commented Nov 4, 2013

clintongormley commented Sep 5, 2014

bobbyhubbard commented Sep 23, 2014

clintongormley commented Sep 23, 2014

bobbyhubbard commented Sep 23, 2014

clintongormley commented Jan 30, 2015