Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ElasticsearchSink] OutOfMemory exception #293

Closed
chuwik opened this issue Nov 19, 2014 · 15 comments
Closed

[ElasticsearchSink] OutOfMemory exception #293

chuwik opened this issue Nov 19, 2014 · 15 comments

Comments

@chuwik
Copy link
Contributor

chuwik commented Nov 19, 2014

Hi,

I have a system using Serilog with the ElasticSearch sink, generating a big volume of logs (~45 million log entries / 24h) and sometimes at high frequency. The problem is that sometimes I get an OutOfMemory exception when I use the ElasticSearch sink; if I use the File sink, for instance, there's no problem. I don't have a clear repro from the code in my system, but I have observed it usually happens when there's a high frequency of logs being created.

I've been able to replicate the issue by tweaking the Serilog.PerformanceTests to use the ElasticSearch sink and more iterations:

static void Main()
{
    const int iterations = 10000000;
    var log = new LoggerConfiguration()
            .WriteTo.ElasticSearch(node: new Uri("http://localhost:9200"))
            .CreateLogger();

    for (var i = 0; i < iterations; i++)
    {
        log.Information("Running iteration {I:00.0} for {@J}!", i, new { Goal = "Speed" });
    }
}

It throws the OutOfMemory exception for me a bit over the 7000000 one. It works fine with the File sink.

Is there any configuration of the ElasticSearch sink that I'm missing and could use to handle this scenario (high volume/frequency)? Is there any documented "ceiling" for how much the sink can take? Or is it just a bug/improvement needed in the code?

Right now I have to default to enabling both the ElasticSearch and File sinks (the latter just for error & up) just to make sure I don't miss anything... which is quite annoying! Hopefully there'll be a fix for this :)

Thanks a lot

@nblumhardt
Copy link
Member

Hi - this is caused by the sink falling behind - there's definitely an improvement that can be made here, but figuring out what the hard queue size limit should be is a tricky call to make.

The cause of the perf problem to start with is very likely the default batch size of 50 events, which is chosen with network traversal in mind, and to a lesser extent the buffering period of 2 sec. On localhost, try for example:

.WriteTo.ElasticSearch(
    node: new Uri("http://localhost:9200"),
    batchPostingLimit: 10000,
    period: TimeSpan.FromMilliseconds(1))

Let me know if this helps, and if you're interested in pursuing the "ceiling" let me know also - I'm not able to attempt it right now but opening a PR/discussion might move things forward on that front.

Cheers,
Nick

@chuwik
Copy link
Contributor Author

chuwik commented Nov 19, 2014

Thanks!

I've tried your suggestion, and also with bigger and lower batch sizes and period, but unfortunately it still gives the same OutOfMemory exception around the 7000000 iteration. I'm happy to look more into it, but I'm afraid I don't know where to start looking for what is causing this. Could you give me some pointers, or have any other ideas for what could be causing it?

Cheers,
Enrique

@nblumhardt
Copy link
Member

Hi again Enrique - any luck with this one (e.g. via the changes in #302)?

Regards,
Nick

@chuwik
Copy link
Contributor Author

chuwik commented Jan 19, 2015

No, unfortunately the issue still repros. I've tried with the latest ES and Serilog version, with the original config for the issue and the ones you proposed, as well as trying out different values for both the "BatchPostingLimit" and "Period"; but in all cases it still ends up throwing an OutOfMemoryException in around iteration 7000000.

Cheers,
Enrique

@mivano
Copy link
Member

mivano commented Jan 19, 2015

I can reproduce the rapid increase in memory now. On Mono however, but it shows the same problem and will hit the ceiling at some point. I have no idea what to do about this. Looked for points where we might not dispose stuff but it all looks okay. Maybe @Mpdreamz has some ideas?
And I also have no idea how to measure where the memory is being used inside the Xamarin tools. Have to use Visual Studio for that I'm afraid. I see that the queue in the PeriodicBatchingSink goes up pretty quickly. So it looks like it simply is not fast enough in writing it to ElasticSearch?

@gmarz
Copy link
Contributor

gmarz commented Jan 20, 2015

@mivano That's correct. IIRC, last time I looked into this, it wasn't a memory leak or anything of that nature. It was simply a matter of events queuing faster than Elasticsearch/Serilog could process them, eventually leading to an OOM exception.

Short of queuing events to disk rather than memory, I'm not sure what else can be done?

@mivano
Copy link
Member

mivano commented Jan 20, 2015

So either we optimize the ElasticSearch sink or use another mechanism. However I have no clue how further optimize the ES sink; async does not help, it already uses bulk indexing and I do not think it has very heavy dependencies. So playing around with timeout and batch size might help a little. You might run into similar issues with other sinks that are slower in submitting their data to its service.

Another mechanism is to send the data to an external queue system like RabbitMQ. This offloads the internal queue, can store on disk if needed and be on another machine. You can fill ES from RabbitMQ using ES rivers or using Logstash.

@konste
Copy link

konste commented Feb 9, 2015

It is always possible that application for some reason generates logs faster than Elasticsearch sink (or any sink in that matter) is able to handle those, regardless of how well you optimize it. One important principle should stay for most cases - problems with the logging should not affect availability (read - crash) of application. Ideally log subsystem should affect app perf (CPU and mem) as little as possible, but definitely should not crash the app.

So let me formulate the problem differently - what can we do to Elasticsearch sink to prevent it from excessive memory consumption even if the price is dropped log events?

@nblumhardt
Copy link
Member

I think PeriodicBatchingSink will already handle this in most (non-pathological?) cases; @gmarz helped drive out that behaviour a little while ago.

https://github.com/serilog/serilog/blob/dev/src/Serilog.FullNetFx/Sinks/PeriodicBatching/PeriodicBatchingSink.cs#L171 handles dropping the queue after a certain number of delivery failures.

Some improvement's obviously possible here and would be appreciated as a PR if anyone can formulate a reliable strategy that doesn't just limit buffer size (which would artificially break the sink for successful high-throughput logging). Ideas?

@nblumhardt
Copy link
Member

Just for context of course, in case it is lost in the depth of this thread. Here we are talking about a test case that does nothing else but pump 7,000,000 events through a log sink - imaginable perhaps, but not really representative of the behavior of even broken apps (an app doing this unexpectedly would most likely die for other reasons ahead of Serilog!) Cheers :-)

@chuwik
Copy link
Contributor Author

chuwik commented Feb 18, 2015

While it's pretty extreme, I have seen this problem happen in a production system with very high volumes of logs. In all fairness they were way too many, and reducing them to a (more) sensible amount made the issue go away. I'd agree with @konste that dropping logs is better than having the app crash, even if this volume is difficult to reach (yet still possible).

Maybe #391 could be a good workaround for this, using a File as an intermediary in any scenarios where there's just too many logs?

@mookid8000
Copy link
Contributor

With #391 and a configuration like this:

var node = new Uri("http://my-elastic-url:9200/");
var options = new ElasticsearchSinkOptions(node)
{
    BufferBaseFilename = "c:\somedir\somewhere\logs",
};
config.WriteTo.Elasticsearch(options);

the original ElasticSearch sink will be bypassed, and the logging will actually be done with a RollingFileSink with ElasticSearchJsonFormatter. In the background, an ElasticSearchLogShipper will then send batches of 100 log events to ElasticSearch as fast as it can.

With this solution, I believe it's only disk space that can turn out to be a problem ;)

@mivano
Copy link
Member

mivano commented Apr 1, 2015

Lets close this one as the durability support is now in the new Elasticsearch sink as described in #394 and will be released soon.

@mivano mivano closed this as completed Apr 1, 2015
@mookid8000
Copy link
Contributor

👍

@chuwik
Copy link
Contributor Author

chuwik commented Apr 1, 2015

Sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants