New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElasticSearch sink durability mode #391

Merged
merged 5 commits into from Feb 28, 2015

Conversation

Projects
None yet
5 participants
@mookid8000
Copy link
Contributor

mookid8000 commented Feb 18, 2015

No description provided.

@mookid8000

This comment has been minimized.

Copy link
Contributor

mookid8000 commented Feb 18, 2015

(I have no idea why this PR seems to be unmergeable - I've been pretty cautious with and tried not to change too much of the existing things)

@mivano

This comment has been minimized.

Copy link
Member

mivano commented Feb 18, 2015

Nice functionality! Looking forward to have this included.

@madstt

This comment has been minimized.

Copy link

madstt commented Feb 18, 2015

👍

@konste

This comment has been minimized.

Copy link

konste commented Feb 18, 2015

Looking forward for this feature! Would make Elasticsearch cluster upgrade so much less stressful!

@konste

This comment has been minimized.

Copy link

konste commented Feb 18, 2015

While at it, is it possible to add an option to override ElasticsearchJsonFormatter with the custom one? Lack of this simple option forces us to use completely custom Elasticsearch sink and I really want to get back to official one.

@mookid8000

This comment has been minimized.

Copy link
Contributor

mookid8000 commented Feb 18, 2015

sure - I'll take a look at it tomorrow

@konste

This comment has been minimized.

Copy link

konste commented Feb 18, 2015

Thanks a bunch!

Konstantin

From: Mogens Heller Grabe [mailto:notifications@github.com]
Sent: Wednesday, February 18, 2015 9:52 AM
To: serilog/serilog
Cc: Konstantin Erman
Subject: Re: [serilog] ElasticSearch sink durability mode (#391)

sure - I'll take a look at it tomorrow


Reply to this email directly or view it on GitHubhttps://github.com//pull/391#issuecomment-74911952.

@mookid8000

This comment has been minimized.

Copy link
Contributor

mookid8000 commented Feb 19, 2015

I've added the CustomFormatter property to the ElasticSearchSinkOptions now, allowing you to override the formatter. It still needs to output JSON though ;)

@konste

This comment has been minimized.

Copy link

konste commented Feb 19, 2015

Sure thing! We just derive from the standard Elasticsearch JSON Formatter and override a few methods.

Another thing which looks like an omission in the standard ES sink is how it totally ignores ES response to bulk index operation. It may contain a ton of usable details!! I will share my implementation shortly.

< sent from mobile device >

On Feb 18, 2015, at 11:50 PM, Mogens Heller Grabe <notifications@github.commailto:notifications@github.com> wrote:

I've added the CustomFormatter property to the ElasticSearchSinkOptions now, allowing you to override the formatter. It still needs to output JSON though ;)

Reply to this email directly or view it on GitHubhttps://github.com//pull/391#issuecomment-75010776.

@konste

This comment has been minimized.

Copy link

konste commented Feb 19, 2015

Here is the link to my mod of Elasticsearch Sink: http://1drv.ms/1AVHkVN

Everything after the line
ElasticsearchResponse elasticsearchResponse = this.client.Bulk(payload);
Is dedicated to receiving, interpreting and reporting of ES response.

See if you can accommodate it to your implementation.
Lack of error reporting in the official sink is very worrysome.

Konstantin

From: Mogens Heller Grabe [mailto:notifications@github.com]
Sent: Wednesday, February 18, 2015 11:51 PM
To: serilog/serilog
Cc: Konstantin Erman
Subject: Re: [serilog] ElasticSearch sink durability mode (#391)

I've added the CustomFormatter property to the ElasticSearchSinkOptions now, allowing you to override the formatter. It still needs to output JSON though ;)


Reply to this email directly or view it on GitHubhttps://github.com//pull/391#issuecomment-75010776.

@mookid8000

This comment has been minimized.

Copy link
Contributor

mookid8000 commented Feb 19, 2015

oh, I didn't realize that errors would not (necessarily) bubble up as exceptions...

...can you be so unfortunate that only the first 50 out of the 100 batch operations are completed successfully? And maybe even so ill-fated that only the middle 50 or every 3rd log event succeeded?

I'm thinking how we can make this into a bulletproof log forwarding solution that will never lose a log event.

@mivano

This comment has been minimized.

Copy link
Member

mivano commented Feb 19, 2015

What we run into now and then is that we try to save a log event that does not fit the mapping. So it tries to save a field.1 as a string but has received at some point an integer so ES has decided it to be a int and will return parsing errors (we need to fix this with a correct mapping, but just an example). So the bulk operations completed successfully, but from the batch of 50 there are instances which failed.

@mookid8000

This comment has been minimized.

Copy link
Contributor

mookid8000 commented Feb 19, 2015

If I understand the bulk API docs correctly, the index command has add/update semantics.

What do you think of a solution where I assign an _id to each log event, and then in the ES log shipper just never continue until each batch was 100% successfully completed?

@konste

This comment has been minimized.

Copy link

konste commented Feb 19, 2015

@mivano - we had to fight with exactly the same problem. I would imagine it to be rather wide spread among those using ES for logging and surprised it is not discussed / addressed all over.

My own workaround for broken dynamic mapping is two fold.
First part is custom ES Jason Serializer, which assigns suffixes to properties names accordingly to their type. So when logs have the properties with the same names and incompatible times - auto added type suffixes allow ES mapping not to choke on it. You can see it in the same location I posted before: http://1drv.ms/1AVHkVN
Second part is a safety net: if any property appears to be unparsable by the rules of its assigned mapping, it still must not prevent the rest of the document from being indexed. There is a setting index.mapping.ignore_malformed which can be set per index or per mapping and it achieves exactly that.

This is specifically why I asked @mookid8000 to enable custom JSON formatter.

@konste

This comment has been minimized.

Copy link

konste commented Feb 19, 2015

@mookid8000, bulk operations in ES. are implemented in a very unusual way..normally I would expect to see "one command, multiple data" - command would be "insert to particular index and data - documents for indexing. For good or bad it is done MIMD style: each document is prepended with individual commend, telling exactly what to do with it.
Unfortunate conclusion means that from any batch arbitrary amount of requests may fail.
That is why I have added all that code which goes through ES bulk operation response and tries to report failures using some another logging facility (we use Seq for that)

Without that those errors only shou up In ES own log file, which is not even clear on which machine within cluster it resides.

@nblumhardt

This comment has been minimized.

Copy link
Member

nblumhardt commented Feb 19, 2015

It's fantastic to see this work happening - thanks all!

As part of #344 we plan to move this sink to a new repository, serilog-sinks-elasticsearch - I'm thinking the way we do it for now is: merge this PR to dev when it's ready, but move the whole thing before merging to master (we've used this tactic already for several other sinks, and we're working through all of them).

@Mpdreamz Mpdreamz referenced this pull request Feb 24, 2015

Closed

ElasticsearchSink 2.0 #394

5 of 7 tasks complete
@nblumhardt

This comment has been minimized.

Copy link
Member

nblumhardt commented Feb 28, 2015

Merging now and moving to https://github.com/serilog/serilog-sinks-elasticsearch (@mivano I've made you an admin of it), will post back here once it's across. Slow and steady :)

@nblumhardt

This comment has been minimized.

Copy link
Member

nblumhardt commented Feb 28, 2015

(I'm assuming that the latter discussion in this thread is in relation to further improvements, let me know if there's any reason to hold fire on this.)

@nblumhardt nblumhardt merged commit 3538486 into serilog:dev Feb 28, 2015

1 check failed

continuous-integration/appveyor AppVeyor was unable to build non-mergeable pull request
Details

@Mpdreamz Mpdreamz referenced this pull request Mar 2, 2015

Merged

First stab at taking this puppy to 2.0 #1

6 of 6 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment