Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add log.message #3

Merged
merged 1 commit into from
Jun 11, 2018
Merged

Add log.message #3

merged 1 commit into from
Jun 11, 2018

Conversation

ruflin
Copy link
Member

@ruflin ruflin commented May 28, 2018

The field log.message contains the full log message before splitting it up in multiple parts.

In contrast to the message field which can contain an extracted part of the log message, this field contains the original, full log message. It can have already some modifications applied like encoding or new lines removed to clean up the log message.

This field is not indexed and doc_values are disabled so it can't be
queried but the value can be retrieved from _source.

@webmat
Copy link
Contributor

webmat commented May 29, 2018

I would prefer event.raw over log.message. I have built an operational monitoring system in the past, that was being fed by SendGrid's Event Webhook (just an example). This is operational data about email delivery and engagement. Having these original events be put in log.message would sound contrived.

I think event.raw sounds more general. Yes, you can stick logs in there, but also any other operational events.

Just my 2¢ :-)

@ruflin ruflin added the discuss label May 30, 2018
@ruflin
Copy link
Member Author

ruflin commented May 30, 2018

Thanks a lot for the feedback, that is super useful. Currently the way I differentiate between logs and metrics is that metrics are pulled on a predefined period and logs are pushed, but this line is very blurry. Based on this the above would also be logs I'm thinking. But I see that a user is in this case not going to look for the "raw event" under log.

The initial reason I wanted to get it out of event prefix was that it felt like the only object in there that does not contain meta information but actual data. I'm also ok with leaving where it is.

@webmat Out of curiosity: Would you categorise operational events under logs, metrics or it's own category?

@ruflin
Copy link
Member Author

ruflin commented May 30, 2018

@webmat Working on elastic/beats#7207 I realised event.raw and log.message might be even two different fields with different purpose. In elastic/beats#7207 I do some processing on the field like encoding, multiline and strip new lines as the target is to make it possible to reprocess the event. In event.raw I would expect a completely unprocessed event. Perhaps we ned both?

@webmat
Copy link
Contributor

webmat commented May 30, 2018

? @webmat Out of curiosity: Would you categorise operational events under logs, metrics or it's own category?

For the type of data I was dealing with, I would definitely say "events", in the sense that it was not just numerical data. I was processing email addresses, IP addresses, email headers and email categories. So full on events with lots of juicy data ;-)

On the other hand, I don't have a strong attachment to the "event" part of event.raw. It could totally be something else equally generic, like raw_payload or the like. The part I was reacting to was losing the "generic"-ness of "event" and going to "log", when not all operational events come from logs*.

* Well, depending on how broad you define log. But I think most people think of text files, when they hear "log". And "most people" is our audience for this ;-)

@ave19
Copy link

ave19 commented May 30, 2018

We have some data sources that aren't logs. Some are data base scrapes and some are API calls we make to a service and then index what comes out. Not sure log.message would apply as it was never in a log and was really never a message either. They do describe things that happened, or at least facts that were true at a particular time, though, so event seems ok.

Indexing the original data (before it is parsed and mutilated) as *.raw would allow us to re-parse events later if we find a bug or gain some other new insight. We wouldn't have to go back to the original source, which may or may not have the information anymore.

@ruflin
Copy link
Member Author

ruflin commented May 31, 2018

Based on the feedback above I reverted the change to event.raw. The idea for this one is to really contain the raw initial event.

For log.message I think it fits the logging purpose and is potentially already modified a bit like encoding applied, new lines removed, multiline applied so it does not full fit event.raw.

My suggestion is now to only introduce log.message.

@webmat
Copy link
Contributor

webmat commented May 31, 2018

Yes that part I agree with. Makes total sense, especially wrt reconstructed messages (like multiline).

schemas/log.yml Outdated

In contrast to the `message` field which can contain an extracted part
of the log message, this field contains the raw log message and should
not be processed. It can have already some modifications like encoding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a contractiction: 'raw log message' and 'not be processed' vs. 'can have already some modifications'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Any suggestion for a better description instead of raw we could use here?

CHANGELOG.md Outdated
@@ -6,6 +6,8 @@ All notable changes to this project will be documented in this file based on the

### Breaking changes

* Rename `event.raw` to `log.message`. #3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be removed now that event.raw stays as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks for the review.

@ruflin ruflin changed the title Rename event.raw to log.message Add log.message Jun 4, 2018
@ruflin ruflin removed the discuss label Jun 6, 2018
@ruflin
Copy link
Member Author

ruflin commented Jun 6, 2018

PR updated with disabling doc_values and setting index: false to disable indexing. This should reduce the storage needed in Elasticsearch.

README.md Outdated
@@ -275,6 +275,7 @@ Fields which are specific to log events.
| <a name="log.level"></a>`log.level` | Log level of the log event.<br/>Some examples are `WARN`, `ERR`, `INFO`. | keyword | | `ERR` |
| <a name="log.line"></a>`log.line` | Line number the log event was collected from. | long | | `18` |
| <a name="log.offset"></a>`log.offset` | Offset of the beginning of the log event. | long | | `12` |
| <a name="log.message"></a>`log.message` | This is the log message and contains the full log message before splitting it up in multiple parts.<br/>In contrast to the `message` field which can contain an extracted part of the log message, this field contains the raw log message and should not be processed. It can have already some modifications like encoding applied or new lines removed to clean up the log message.<br/>This field is not index and doc_values are disabled so it can't be queried but the value can be retrieved from `_source`. | keyword | | `Sep 19 08:26:10 localhost My log` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"not index" -> "not indexed"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, w/ regards to my previous comment: "[..] this field contains the raw log message and should not be processed." -> "[..] this field contains the original, full log message."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applyed

The field `log.message` contains the full log message before splitting it up in multiple parts.

In contrast to the `message` field which can contain an extracted part of the log message, this field contains the original, full log message. It can have already some modifications applied like encoding or new lines removed to clean up the log message.

This field is not indexed and doc_values are disabled so it can't be
queried but the value can be retrieved from `_source`.
@ruflin
Copy link
Member Author

ruflin commented Jun 11, 2018

PR rebased, commit and PR message updated, fixes applied. Read for an other review.

@@ -459,6 +459,12 @@
"line": {
"type": "long"
},
"message": {
"doc_values": false,
"ignore_above": 1024,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For logs, ignore_above: 1024 will be too small.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the docs for this and run some tests: https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html

It seems like ignore_above does not play any role here as it's not index anyway. _source is not affected by ignore_above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. Thanks for testing.

Maybe we shouldn't even set an ignore_above when index: false? I imagine others having a similar reaction to mine.

@andrewkroh andrewkroh merged commit b058a31 into elastic:master Jun 11, 2018
@ruflin ruflin deleted the log-message branch June 11, 2018 13:01
@Aqualie Aqualie mentioned this pull request Dec 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants