Delete By Query under heavy indexing load causes OOM errors #6025

xyu · 2014-05-02T18:26:10Z

Hello,

We are running an ES 1.1.1 cluster and when we bulk index into it with a high write load we have found that it can trigger OOM errors if we run deletebyquery calls. The symptoms of the problems appears to be the following:

Index into the cluster with a heavy write load, this will cause merges to pile up. During the pileup the merge thread pool queue goes up and stays at ~120 per node. (Maybe related to Merges might not be picked up when they are ready #5779?)
Merges are no longer able to keep up with new segments created from indexing operations and segments per node shoots up. (We have seen over 30k segments per node.)
Long GC cycles are triggered which may or may not cause the node to drop out of the cluster.
When a Delete By Query call is issued an internal refresh is triggered (Delete by query should not silently refresh index #3593) which fails due to an out of memory error and causes the node to be removed from the cluster for not responding. (See gist for logs. In this example es13.iad sends a call to es13.sat which experiences OOM causing es13.iad to remove it from the cluster.)

Please let me know if there are any other info I can provide that may be helpful.

xyu · 2014-05-12T15:37:51Z

It appears that we are experiencing segment explosion issues during these times and are seeing queues in the merge thread pool. For now we are working around this issue by backing off on bulk index when merges fall behind in our indexing jobs. It looks like this issue is also being worked on in #6066

clintongormley · 2014-07-28T09:27:19Z

Fixed by #6066

mikemccand · 2015-03-04T19:25:22Z

Alas, deleteByQuery is not throttled when merges are falling behind.

mikemccand · 2015-03-04T19:38:11Z

I'll make this operation throttled like we do for index/create ops.

However I'm doubtful this is "enough" to always prevent segment explosion, i.e. #7052 is a better (trickier) long term solution.

s1monw · 2015-05-28T16:01:21Z

@mikemccand can we close this?

mikemccand · 2015-05-28T16:59:42Z

Yeah, I'll close it ... I think you can still easily provoke OOME on 1.6, but in 2.0 we are switching to a plugin that runs scan/scroll query and then bulk delete the resulting IDs, which does fix it.

clintongormley closed this as completed Jul 28, 2014

clintongormley mentioned this issue Jul 28, 2014

Reimplement delete-by-query as a bulk request #7052

Closed

mikemccand reopened this Mar 4, 2015

mikemccand self-assigned this Mar 4, 2015

mikemccand added v2.0.0-beta1 v1.5.0 v1.4.5 >bug labels Mar 4, 2015

mikemccand mentioned this issue Mar 11, 2015

Core: remove delete-by-query #10067

Closed

s1monw added v1.6.0 and removed v1.5.0 labels Mar 17, 2015

clintongormley removed the v1.4.5 label Apr 11, 2015

mikemccand closed this as completed May 28, 2015

clintongormley removed v1.6.0 v2.0.0-beta1 labels May 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete By Query under heavy indexing load causes OOM errors #6025

Delete By Query under heavy indexing load causes OOM errors #6025

xyu commented May 2, 2014

xyu commented May 12, 2014

clintongormley commented Jul 28, 2014

mikemccand commented Mar 4, 2015

mikemccand commented Mar 4, 2015

s1monw commented May 28, 2015

mikemccand commented May 28, 2015

Delete By Query under heavy indexing load causes OOM errors #6025

Delete By Query under heavy indexing load causes OOM errors #6025

Comments

xyu commented May 2, 2014

xyu commented May 12, 2014

clintongormley commented Jul 28, 2014

mikemccand commented Mar 4, 2015

mikemccand commented Mar 4, 2015

s1monw commented May 28, 2015

mikemccand commented May 28, 2015