Is ECS confusing to you? #516

anhlqn · 2019-08-09T21:02:41Z

I've been using Elastic stack for Log Analytics use case for a few years since v1.4.0 with more than 30 types of logs. I think most of us know the challenge of not having common fields across log types, so we came up with our internal core and standard fields to enable log correlation. When I heard about the release of ECS, I thought I can just look at the standard document and easily map our current fields to ECS fields.

That's not really the case, at least for me. I guess some new Elastic users may have the same frustration when looking to adopt ECS.

The goal of ECS is to enable and encourage users of Elasticsearch to normalize their event data, so that they can better analyze, visualize, and correlate the data represented in their events.

ECS fields follow a series of guidelines, to ensure a consistent and predictable feel, across various use cases.

If Elastic expects the community to adopt ECS, that means ECS should be as simple as possible, not confusing. I've read and re-read the ECS document multiple times and end up having to search through issues in the Github repo to know how to properly map a field to ECS. Almost have to pull a few people in a meeting when we have a new field.

I like the idea of grouping related fields into fieldsets, but I think there is room for improvement.

source/destination vs. client/server
source/client and server/destination have the same nested fields, so why don't we just choose one pair and get rid of the other? source/destination pair is more generic and will fit most use cases.

Client / server representations can add semantic context to an exchange, which is helpful to visualize the data in certain situations. If your context falls in that category, you should still ensure that source and destination are filled appropriately.

I think this approach is unnecessarily complicated. It confuses users and can significantly increase the storage size at the same time if we are looking at billions of events. Netflow, firewall, web access, or IPS/IDS logs can all use source/destination pair. With two pairs, the questions users face are:

When to populate source/destination?
When to populate client/server?
When to populate two pairs?
Which pair should I use when searching by field:value syntax?
Should correlation be done on one pair or the other?

These questions aren't answered in the current ECS document.

log vs. event
I have the same confusion with these two field sets. I believe ECS is designed for Log Analytics use case, which aims to support the new Kibana SIEM app. Usually, any log message that enters the SIEM is treated as an event, so event field set is a great one. Why also log to confuse users? If I want to map attributes of a message to ECS, do I have to jump between log and event field sets?

log.level vs. event.severity? and someone has proposed a event.level here #129

log.original vs. event.original? Would most users know which field to use by reading the ECS document? @ruflin clarified them in #127, but I don't think that's enough. Why don't we just stick with event field set and add event.raw, event.original, event.normalized/transformed or similar nested fields to support the need.

There are many nested fields under the event field set to describe an event. Will the log field set continue to add the same nested fields and end up with the same situation as source/destination and client/server?

I propose getting rid of log field set and keep only event field set.

event field set
Has someone looked at the nested fields under event field set and not found them confusing?

event.category
event.dataset
event.kind
event.module
event.type

event.dataset vs. event.module?
event.category vs. event.kind?

Some of these fields come with the following warning:

Warning: In future versions of ECS, we plan to provide a list of acceptable values for this field, please use with caution.

I read the warning as "Don't use these nested fields yet".

The release of ECS was a bit late but a great step for Log Analytics use cases. I'm looking forward to migrating to ECS once the standard is less confusing.

anhlqn · 2019-08-09T21:12:33Z

Oh, also base fields like labels vs tags. I like the labels field, but the examples of labels and tags confused me.

labels
example: {'application': 'foo-bar', 'env': 'production'}

tags
example: ["production", "env2"]

Should I use labels.env or add environment values as tags? It depends? Since ECS is a specification, the interpretation and implementation should be consistent as much as possible.

DeathsPirate · 2019-08-10T09:42:02Z

I completely agree with this. It's difficult trying to map fields and the duplication of fields (although under different names as stated previously) is very difficult to understand. I would like to propose a working group to go through common event producers and create simple mapping tables for each. I'm happy to work on this but would like to have input from the ECS team and others in the community. Regards Dave

…

On Fri, 9 Aug 2019, 23:12 Anh Le, ***@***.***> wrote: Oh, also base fields like labels vs tags. I like the labels field, but the examples of labels and tags confused me. *labels* example: {'application': 'foo-bar', 'env': 'production'} *tags* example: ["production", "env2"] Should I use labels.env or add environment values as tags? It depends? Since ECS is a specification, the interpretation and implementation should be consistent as much as possible. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#516?email_source=notifications&email_token=ABUXZJ7VFESP2ZIRY6RRCJ3QDXMURA5CNFSM4IKXK2XKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD37ZA7I#issuecomment-520065149>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABUXZJ2CM63LJJYQ3VQVW5TQDXMURANCNFSM4IKXK2XA> .

Randy-312 · 2019-08-15T19:23:19Z

Count me in on the working group with Dave. I'm aligned with ECS, for multiple reasons.

MikePaquette · 2019-08-20T01:40:59Z

@anhlqn Thank you for your feedback and willingness to help make ECS simpler to understand. We are sorry that you are feeling frustrated and confused. I think I can speak for all contributors in saying that we share a desire to make ECS simple to adopt, implement, understand, and use. We definitely have some work still to do. Please allow me to try to address some of your concerns:

source/destination vs. client/server
source/client and server/destination have the same nested fields, so why don't we just choose one pair and get rid of the other? source/destination pair is more generic and will fit most use cases.

The reason for two sets of fields is that the scope of ECS includes network monitoring use cases, where it is sometimes important to distinguish between the the source sending a network packet, and the client, which initiated the connection or transaction. As a simple example, in a TCP exchange, when an ACK packet is sent from the server to the client, the source IP of the ACK packet will be the server address, and the destination IP address of the packet will be of the client. In order to support this use case, it is necessary to have both sets of fields.

With that said, I think we can do a better job of documenting the rules for populating these field sets. I'd propose the following simple rule for consideration:

If you can identify the client and server roles associated with the event, fill both field sets (client/server and source/destination) else just fill source/destination.

I think this would help answer your questions:

When to populate source/destination? Always
When to populate client/server? Whenever you can
When to populate two pairs? Whenever you populate client/server, also populate source/destination
Which pair should I use when searching by field:value syntax? Use source/destination unless you know that your data set has client/server fields populated
Should correlation be done on one pair or the other? Almost always use source/destination for correlation.

log vs. event
I have the same confusion with these two field sets. I believe ECS is designed for Log Analytics use case, which aims to support the new Kibana SIEM app. Usually, any log message that enters the SIEM is treated as an event, so event field set is a great one. Why also log to confuse users? If I want to map attributes of a message to ECS, do I have to jump between log and event field sets?
log.level vs. event.severity? and someone has proposed a event.level here #129
log.original vs. event.original? Would most users know which field to use by reading the ECS document? @ruflin clarified them in #127, but I don't think that's enough. Why don't we just stick with event field set and add event.raw, event.original, event.normalized/transformed or similar nested fields to support the need.
There are many nested fields under the event field set to describe an event. Will the log field set continue to add the same nested fields and end up with the same situation as source/destination and client/server?
I propose getting rid of log field set and keep only event field set.

You raise a very good point about these fields. We need to do a much better job with this documentation. For now, a simple rule:
If you don't have a specific reason to use the log.* fields, use only the event.* fields.
I will work with contributors to improve this documentation to help avoid confusion.

event field set
Has someone looked at the nested fields under event field set and not found them confusing?
event.category
event.dataset
event.kind
event.module
event.type
event.dataset vs. event.module?
event.category vs. event.kind?
Some of these fields come with the following warning:

Warning: In future versions of ECS, we plan to provide a list of acceptable values for this field, please use with caution.

I read the warning as "Don't use these nested fields yet".

You are reading this correctly, and this will get better soon. Please see the discussion at #447 (comment) and #290 (comment).

In a nutshell, event.module and event.dataset tell you where the event comes from, and the other categorization fields will tell you how the event should be analyzed/visualized/correlated. Yes, there is some overlap, but if you are analyzing and correlating events, you will be using event.kind, event.category, event.action, event.type, event.outcome (Yes, I know, as soon as they are defined with enumerated values), but if you are trying to analyze your data sources, input rates, log continuity, etc., then you should use the event.module and event.dataset fields.

Oh, also base fields like labels vs tags. I like the labels field, but the examples of labels and tags confused me.
labels
example: {'application': 'foo-bar', 'env': 'production'}
tags
example: ["production", "env2"]
Should I use labels.env or add environment values as tags? It depends? Since ECS is a specification, the interpretation and implementation should be consistent as much as possible.

Another good point. We need to provide better documentation to make it clear that these are for custom use. Which means that you can use them for environment-specific purposes. If you need KV pairs, use labels, if you need just keywords, use tags. ECS-compatible analysis content meant to work across all ECS environments should not consume or depend upon values in these fields.

@DeathsPirate @Randy-312 I like the sound of a working group, but what specifically did you have in mind? In many ways this open source GitHub repo is indeed an (albeit asynchronous) working group. Were you thinking of live IRC-based session on a periodic basis? I'd want to be cautions to make sure any additional forum was open and available to all our contributors and community members.

Again, thank you all for your thoughtful input and offers to help make ECS simpler.

davidhowell-tx · 2019-10-31T16:01:18Z

Any ideas on when some additional clarification will be available?

ansoni · 2020-01-14T05:48:39Z

So I spent the last hour trying to map custom application api logs into ECS. It is extremely frustrating. The lack of concrete use-cases hurts ECS IMO. I would like to see some common references like:

lb-f5
lb-aws-alb
lb-aws-network
restful-api

This will make it easier to pick a "similar" use-case when mapping in a new type.

So far, I haven't found fields for:

Cloud Account Alias - used cloud.account.alias (I opened a previous issue on this, but didn't have the bandwidth to follow-up)
Customer Name - used customer.name
Customer Support Level - used customer.support_level
Session Id - used session.id
Environment - I used environment.name
API fields - use event?

As it stands right now, I am either overloading everything into event or I am creating custom fields since ECS hasn't given an opinion. It works fine if it is just me, but this becomes a problem if I ask other people to fix their datasets. With ECS being incomplete, I really can't point anyone to it and expect consistent, normalized results.

djptek · 2021-07-13T13:28:48Z

Thanks for all your input on this, the ECS Schema & Docs have evolved and Matured, addressing many of these concerns, and from this point we would consider specific fixes to individual fields rather than wider changes with potential for reverse compatibility issues.

Please feel free to re-open if you have new input

webmat mentioned this issue Sep 4, 2019

Beef up the description of the log field set. #540

Merged

MikePaquette mentioned this issue Sep 23, 2019

Add log.origin fields #563

Merged

ebeahan mentioned this issue Jul 17, 2020

Add getting started doc #888

Merged

This was referenced Aug 14, 2020

Redesign field details page #932

Open

Introduce free-form text sections in docs #943

Closed

ebeahan added the discuss label Dec 1, 2020

djptek closed this as completed Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is ECS confusing to you? #516

Is ECS confusing to you? #516

anhlqn commented Aug 9, 2019 •

edited

Loading

anhlqn commented Aug 9, 2019

DeathsPirate commented Aug 10, 2019 via email

Randy-312 commented Aug 15, 2019

MikePaquette commented Aug 20, 2019

davidhowell-tx commented Oct 31, 2019

ansoni commented Jan 14, 2020

djptek commented Jul 13, 2021

Is ECS confusing to you? #516

Is ECS confusing to you? #516

Comments

anhlqn commented Aug 9, 2019 • edited Loading

anhlqn commented Aug 9, 2019

DeathsPirate commented Aug 10, 2019 via email

Randy-312 commented Aug 15, 2019

MikePaquette commented Aug 20, 2019

davidhowell-tx commented Oct 31, 2019

ansoni commented Jan 14, 2020

djptek commented Jul 13, 2021

anhlqn commented Aug 9, 2019 •

edited

Loading