-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ElasticsearchSink] OutOfMemory exception #293
Comments
Hi - this is caused by the sink falling behind - there's definitely an improvement that can be made here, but figuring out what the hard queue size limit should be is a tricky call to make. The cause of the perf problem to start with is very likely the default batch size of 50 events, which is chosen with network traversal in mind, and to a lesser extent the buffering period of 2 sec. On localhost, try for example:
Let me know if this helps, and if you're interested in pursuing the "ceiling" let me know also - I'm not able to attempt it right now but opening a PR/discussion might move things forward on that front. Cheers, |
Thanks! I've tried your suggestion, and also with bigger and lower batch sizes and period, but unfortunately it still gives the same OutOfMemory exception around the 7000000 iteration. I'm happy to look more into it, but I'm afraid I don't know where to start looking for what is causing this. Could you give me some pointers, or have any other ideas for what could be causing it? Cheers, |
Hi again Enrique - any luck with this one (e.g. via the changes in #302)? Regards, |
No, unfortunately the issue still repros. I've tried with the latest ES and Serilog version, with the original config for the issue and the ones you proposed, as well as trying out different values for both the "BatchPostingLimit" and "Period"; but in all cases it still ends up throwing an OutOfMemoryException in around iteration 7000000. Cheers, |
I can reproduce the rapid increase in memory now. On Mono however, but it shows the same problem and will hit the ceiling at some point. I have no idea what to do about this. Looked for points where we might not dispose stuff but it all looks okay. Maybe @Mpdreamz has some ideas? |
@mivano That's correct. IIRC, last time I looked into this, it wasn't a memory leak or anything of that nature. It was simply a matter of events queuing faster than Elasticsearch/Serilog could process them, eventually leading to an OOM exception. Short of queuing events to disk rather than memory, I'm not sure what else can be done? |
So either we optimize the ElasticSearch sink or use another mechanism. However I have no clue how further optimize the ES sink; async does not help, it already uses bulk indexing and I do not think it has very heavy dependencies. So playing around with timeout and batch size might help a little. You might run into similar issues with other sinks that are slower in submitting their data to its service. Another mechanism is to send the data to an external queue system like RabbitMQ. This offloads the internal queue, can store on disk if needed and be on another machine. You can fill ES from RabbitMQ using ES rivers or using Logstash. |
It is always possible that application for some reason generates logs faster than Elasticsearch sink (or any sink in that matter) is able to handle those, regardless of how well you optimize it. One important principle should stay for most cases - problems with the logging should not affect availability (read - crash) of application. Ideally log subsystem should affect app perf (CPU and mem) as little as possible, but definitely should not crash the app. So let me formulate the problem differently - what can we do to Elasticsearch sink to prevent it from excessive memory consumption even if the price is dropped log events? |
I think https://github.com/serilog/serilog/blob/dev/src/Serilog.FullNetFx/Sinks/PeriodicBatching/PeriodicBatchingSink.cs#L171 handles dropping the queue after a certain number of delivery failures. Some improvement's obviously possible here and would be appreciated as a PR if anyone can formulate a reliable strategy that doesn't just limit buffer size (which would artificially break the sink for successful high-throughput logging). Ideas? |
Just for context of course, in case it is lost in the depth of this thread. Here we are talking about a test case that does nothing else but pump 7,000,000 events through a log sink - imaginable perhaps, but not really representative of the behavior of even broken apps (an app doing this unexpectedly would most likely die for other reasons ahead of Serilog!) Cheers :-) |
While it's pretty extreme, I have seen this problem happen in a production system with very high volumes of logs. In all fairness they were way too many, and reducing them to a (more) sensible amount made the issue go away. I'd agree with @konste that dropping logs is better than having the app crash, even if this volume is difficult to reach (yet still possible). Maybe #391 could be a good workaround for this, using a File as an intermediary in any scenarios where there's just too many logs? |
With #391 and a configuration like this:
the original ElasticSearch sink will be bypassed, and the logging will actually be done with a With this solution, I believe it's only disk space that can turn out to be a problem ;) |
Lets close this one as the durability support is now in the new Elasticsearch sink as described in #394 and will be released soon. |
👍 |
Sounds good! |
Hi,
I have a system using Serilog with the ElasticSearch sink, generating a big volume of logs (~45 million log entries / 24h) and sometimes at high frequency. The problem is that sometimes I get an OutOfMemory exception when I use the ElasticSearch sink; if I use the File sink, for instance, there's no problem. I don't have a clear repro from the code in my system, but I have observed it usually happens when there's a high frequency of logs being created.
I've been able to replicate the issue by tweaking the Serilog.PerformanceTests to use the ElasticSearch sink and more iterations:
It throws the OutOfMemory exception for me a bit over the 7000000 one. It works fine with the File sink.
Is there any configuration of the ElasticSearch sink that I'm missing and could use to handle this scenario (high volume/frequency)? Is there any documented "ceiling" for how much the sink can take? Or is it just a bug/improvement needed in the code?
Right now I have to default to enabling both the ElasticSearch and File sinks (the latter just for error & up) just to make sure I don't miss anything... which is quite annoying! Hopefully there'll be a fix for this :)
Thanks a lot
The text was updated successfully, but these errors were encountered: