Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify event.original vs log.original? #127

Closed
jcallaha opened this issue Sep 27, 2018 · 7 comments
Closed

Clarify event.original vs log.original? #127

jcallaha opened this issue Sep 27, 2018 · 7 comments

Comments

@jcallaha
Copy link

Can someone explain the need for log.original given event.original? They look identical to me.

@ruflin
Copy link
Member

ruflin commented Sep 27, 2018

The important difference is here:

It can have already some modifications applied like encoding or new lines removed to clean up the log message.

@jcallaha
Copy link
Author

So event.original would be the true original version that matches byte for byte against the underlying log file and log.original might be slightly cleaned up?

If the use of both is to demonstrate log integrity, and they aren't searchable, wouldn't the event.original version be more valuable and obviate the need for log.original, especially if event.hash were also implemented?

@webmat
Copy link
Contributor

webmat commented Sep 27, 2018

Looking at the definitions of both, they seem to be flipped around vs what I would expect.

In my mind, log.original should be the least processed of the two. For me, "log" is closer to the raw source, whereas "event" would be a higher level construct that should be more readily consumable.

But the descriptions seem to say that event.original is the unmodified one, and log.original is a bit more refined: its encoding could have been changed, and perhaps newlines cleaned up a bit.

Do we really need the two?

If we get rid of one of the two, perhaps the most canonical one we should keep should be event.original (since not all events come from log files)?

@ruflin
Copy link
Member

ruflin commented Sep 28, 2018

event.original is intended to show integrity but also to have the original around if it is log or not. log.original has the intention to make log reprocessing possible. There are things which can only happen with the knowledge on the edge node and some processing must already happen.

We could merge the two but would agree that in the case of logs event.orginial would not be untouched anymore.

My personal preference is to keep both to make it very clear if log.orginal is there, it's for log reprocessing.

@webmat
Copy link
Contributor

webmat commented Sep 28, 2018

@ruflin I'm actually unclear on the two examples you give about log.original.

  • Encoding: Most logs are UTF-8 or ASCII (a subset of UTF-8), and get sent along as UTF-8 as soon as Beats reads them, no? I can see cases where the encoding is different. E.g. latin1 on Windows (not sure if they changed that): when Beats reads these logs and events, what does it do? Isn't it converted to UTF-8 prior to sending along the event?
  • Newline cleanup: I don't actually have a good sense of what you mean by this.
    • If it's related to the typical "multiline" work of forming one event out of multiple log lines, I would expect the .original field to contain the full multiline payload in one event. So I'm not sure what the newline cleanup is referring to.

@ruflin
Copy link
Member

ruflin commented Oct 1, 2018

For the encoding: Yes it's converted to utf-8. There are unfortunately quite a few more formats out there then the ones you mentioned.

Each line at the end contains one or multiple new line chars (\r, \n) and these are stripped way. The multiline inside is not touched. In the docker case a multiline is different. It contains of two json log lines and the first one contains a partial flag. What happens in this case is the two lines are merged together into 1 json object. The original will not be able to be restored.

If we loose up event.original a bit to also allow such things, they could potentially be merged.

@ebeahan
Copy link
Member

ebeahan commented May 28, 2021

Discussion continued on this topic in #841.

log.original will soon be deprecated and later removed from the schema in the next major ECS release. The full details are covered in RFC 0017.

@ebeahan ebeahan closed this as completed May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants