-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
possible memory leak index query cache #18161
Comments
What happens when you remove the ridiculously high search thread pool?
|
we keep track of the thread pool usage, and raising it to 1000 was an attempt to see the behavior. The number of threads being used goes around 5-15. It only reaches to 100 when the heap is taken, so that's a side effect. |
@wfelipe yes, but what happens when you use the default setting for the search threadpool size, which is (number of processors * 3)/2+1. You don't mention how many processors you have, but just unsetting this setting will give you the default. With a high size, if search is struggling for whatever reason, then it'll just use one of the many threads that you have allowed it to use which will bring a system to its knees. Instead, with a reasonable thread pool size, search requests will be queued or rejected, keeping the system healthy. That's why I want to see what happens to memory usage when the threads setting is the default. |
Everything you are describing is a side effect of having too many threads * segments per node. Lucene keeps state in a thread local per segment, which is why you are seeing so many instances of SegmentCodeReaders and CompressingStoredFieldsReader. You should try to reduce the size of the search/get pools and have fewer (larger) segments per node. |
Elasticsearch version: 2.2.2 and 2.3.2
JVM version: 1.8_65 and 1.8_92
OS version: centos 7 (kernel 3.10.0-327.13.1.el7.x86_64)
Description of the problem including expected versus actual behavior:
this is a testing system, writing has been disabled, and only search is working, here is the environment:
configuration:
Steps to reproduce:
the cluster stays fine without any queries (heap around 7gb on 2.3.2, and 4gb on 2.2.2). Once we start sending queries (200-300 reqs/s), the cluster eats up the heap, and after the oldgen gc starts to run, it never frees enough memory.
after a couple hours, the cluster becomes unresponsive and a restart is required.
Provide logs (if relevant):
two memory dumps were taken, and both reported the same suspects, here is one taken from one of the dump:
attached memory reports
The text was updated successfully, but these errors were encountered: