Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete By Query under heavy indexing load causes OOM errors #6025

Closed
xyu opened this issue May 2, 2014 · 6 comments
Closed

Delete By Query under heavy indexing load causes OOM errors #6025

xyu opened this issue May 2, 2014 · 6 comments
Assignees
Labels

Comments

@xyu
Copy link
Contributor

xyu commented May 2, 2014

Hello,

We are running an ES 1.1.1 cluster and when we bulk index into it with a high write load we have found that it can trigger OOM errors if we run deletebyquery calls. The symptoms of the problems appears to be the following:

  1. Index into the cluster with a heavy write load, this will cause merges to pile up. During the pileup the merge thread pool queue goes up and stays at ~120 per node. (Maybe related to Merges might not be picked up when they are ready #5779?)
  2. Merges are no longer able to keep up with new segments created from indexing operations and segments per node shoots up. (We have seen over 30k segments per node.)
  3. Long GC cycles are triggered which may or may not cause the node to drop out of the cluster.
  4. When a Delete By Query call is issued an internal refresh is triggered (Delete by query should not silently refresh index #3593) which fails due to an out of memory error and causes the node to be removed from the cluster for not responding. (See gist for logs. In this example es13.iad sends a call to es13.sat which experiences OOM causing es13.iad to remove it from the cluster.)

Please let me know if there are any other info I can provide that may be helpful.

@xyu
Copy link
Contributor Author

xyu commented May 12, 2014

It appears that we are experiencing segment explosion issues during these times and are seeing queues in the merge thread pool. For now we are working around this issue by backing off on bulk index when merges fall behind in our indexing jobs. It looks like this issue is also being worked on in #6066

@clintongormley
Copy link

Fixed by #6066

@mikemccand
Copy link
Contributor

Alas, deleteByQuery is not throttled when merges are falling behind.

@mikemccand mikemccand reopened this Mar 4, 2015
@mikemccand mikemccand self-assigned this Mar 4, 2015
@mikemccand
Copy link
Contributor

I'll make this operation throttled like we do for index/create ops.

However I'm doubtful this is "enough" to always prevent segment explosion, i.e. #7052 is a better (trickier) long term solution.

@s1monw
Copy link
Contributor

s1monw commented May 28, 2015

@mikemccand can we close this?

@mikemccand
Copy link
Contributor

Yeah, I'll close it ... I think you can still easily provoke OOME on 1.6, but in 2.0 we are switching to a plugin that runs scan/scroll query and then bulk delete the resulting IDs, which does fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants