Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is ECS confusing to you? #516

Closed
anhlqn opened this issue Aug 9, 2019 · 7 comments
Closed

Is ECS confusing to you? #516

anhlqn opened this issue Aug 9, 2019 · 7 comments
Labels

Comments

@anhlqn
Copy link

anhlqn commented Aug 9, 2019

I've been using Elastic stack for Log Analytics use case for a few years since v1.4.0 with more than 30 types of logs. I think most of us know the challenge of not having common fields across log types, so we came up with our internal core and standard fields to enable log correlation. When I heard about the release of ECS, I thought I can just look at the standard document and easily map our current fields to ECS fields.

That's not really the case, at least for me. I guess some new Elastic users may have the same frustration when looking to adopt ECS.

The goal of ECS is to enable and encourage users of Elasticsearch to normalize their event data, so that they can better analyze, visualize, and correlate the data represented in their events.

ECS fields follow a series of guidelines, to ensure a consistent and predictable feel, across various use cases.

If Elastic expects the community to adopt ECS, that means ECS should be as simple as possible, not confusing. I've read and re-read the ECS document multiple times and end up having to search through issues in the Github repo to know how to properly map a field to ECS. Almost have to pull a few people in a meeting when we have a new field.

I like the idea of grouping related fields into fieldsets, but I think there is room for improvement.

source/destination vs. client/server
source/client and server/destination have the same nested fields, so why don't we just choose one pair and get rid of the other? source/destination pair is more generic and will fit most use cases.

Client / server representations can add semantic context to an exchange, which is helpful to visualize the data in certain situations. If your context falls in that category, you should still ensure that source and destination are filled appropriately.

I think this approach is unnecessarily complicated. It confuses users and can significantly increase the storage size at the same time if we are looking at billions of events. Netflow, firewall, web access, or IPS/IDS logs can all use source/destination pair. With two pairs, the questions users face are:

  • When to populate source/destination?
  • When to populate client/server?
  • When to populate two pairs?
  • Which pair should I use when searching by field:value syntax?
  • Should correlation be done on one pair or the other?

These questions aren't answered in the current ECS document.

log vs. event
I have the same confusion with these two field sets. I believe ECS is designed for Log Analytics use case, which aims to support the new Kibana SIEM app. Usually, any log message that enters the SIEM is treated as an event, so event field set is a great one. Why also log to confuse users? If I want to map attributes of a message to ECS, do I have to jump between log and event field sets?

log.level vs. event.severity? and someone has proposed a event.level here #129

log.original vs. event.original? Would most users know which field to use by reading the ECS document? @ruflin clarified them in #127, but I don't think that's enough. Why don't we just stick with event field set and add event.raw, event.original, event.normalized/transformed or similar nested fields to support the need.

There are many nested fields under the event field set to describe an event. Will the log field set continue to add the same nested fields and end up with the same situation as source/destination and client/server?

I propose getting rid of log field set and keep only event field set.

event field set
Has someone looked at the nested fields under event field set and not found them confusing?

  • event.category
  • event.dataset
  • event.kind
  • event.module
  • event.type

event.dataset vs. event.module?
event.category vs. event.kind?

Some of these fields come with the following warning:

Warning: In future versions of ECS, we plan to provide a list of acceptable values for this field, please use with caution.

I read the warning as "Don't use these nested fields yet".

The release of ECS was a bit late but a great step for Log Analytics use cases. I'm looking forward to migrating to ECS once the standard is less confusing.

@anhlqn
Copy link
Author

anhlqn commented Aug 9, 2019

Oh, also base fields like labels vs tags. I like the labels field, but the examples of labels and tags confused me.

labels
example: {'application': 'foo-bar', 'env': 'production'}

tags
example: ["production", "env2"]

Should I use labels.env or add environment values as tags? It depends? Since ECS is a specification, the interpretation and implementation should be consistent as much as possible.

@DeathsPirate
Copy link

DeathsPirate commented Aug 10, 2019 via email

@Randy-312
Copy link

Count me in on the working group with Dave. I'm aligned with ECS, for multiple reasons.

@MikePaquette
Copy link
Contributor

@anhlqn Thank you for your feedback and willingness to help make ECS simpler to understand. We are sorry that you are feeling frustrated and confused. I think I can speak for all contributors in saying that we share a desire to make ECS simple to adopt, implement, understand, and use. We definitely have some work still to do. Please allow me to try to address some of your concerns:

source/destination vs. client/server
source/client and server/destination have the same nested fields, so why don't we just choose one pair and get rid of the other? source/destination pair is more generic and will fit most use cases.

The reason for two sets of fields is that the scope of ECS includes network monitoring use cases, where it is sometimes important to distinguish between the the source sending a network packet, and the client, which initiated the connection or transaction. As a simple example, in a TCP exchange, when an ACK packet is sent from the server to the client, the source IP of the ACK packet will be the server address, and the destination IP address of the packet will be of the client. In order to support this use case, it is necessary to have both sets of fields.

With that said, I think we can do a better job of documenting the rules for populating these field sets. I'd propose the following simple rule for consideration:

If you can identify the client and server roles associated with the event, fill both field sets (client/server and source/destination) else just fill source/destination.

I think this would help answer your questions:

  • When to populate source/destination? Always
  • When to populate client/server? Whenever you can
  • When to populate two pairs? Whenever you populate client/server, also populate source/destination
  • Which pair should I use when searching by field:value syntax? Use source/destination unless you know that your data set has client/server fields populated
  • Should correlation be done on one pair or the other? Almost always use source/destination for correlation.

log vs. event
I have the same confusion with these two field sets. I believe ECS is designed for Log Analytics use case, which aims to support the new Kibana SIEM app. Usually, any log message that enters the SIEM is treated as an event, so event field set is a great one. Why also log to confuse users? If I want to map attributes of a message to ECS, do I have to jump between log and event field sets?
log.level vs. event.severity? and someone has proposed a event.level here #129
log.original vs. event.original? Would most users know which field to use by reading the ECS document? @ruflin clarified them in #127, but I don't think that's enough. Why don't we just stick with event field set and add event.raw, event.original, event.normalized/transformed or similar nested fields to support the need.
There are many nested fields under the event field set to describe an event. Will the log field set continue to add the same nested fields and end up with the same situation as source/destination and client/server?
I propose getting rid of log field set and keep only event field set.

You raise a very good point about these fields. We need to do a much better job with this documentation. For now, a simple rule:
If you don't have a specific reason to use the log.* fields, use only the event.* fields.
I will work with contributors to improve this documentation to help avoid confusion.

event field set
Has someone looked at the nested fields under event field set and not found them confusing?
event.category
event.dataset
event.kind
event.module
event.type
event.dataset vs. event.module?
event.category vs. event.kind?
Some of these fields come with the following warning:

Warning: In future versions of ECS, we plan to provide a list of acceptable values for this field, please use with caution.

I read the warning as "Don't use these nested fields yet".

You are reading this correctly, and this will get better soon. Please see the discussion at #447 (comment) and #290 (comment).

In a nutshell, event.module and event.dataset tell you where the event comes from, and the other categorization fields will tell you how the event should be analyzed/visualized/correlated. Yes, there is some overlap, but if you are analyzing and correlating events, you will be using event.kind, event.category, event.action, event.type, event.outcome (Yes, I know, as soon as they are defined with enumerated values), but if you are trying to analyze your data sources, input rates, log continuity, etc., then you should use the event.module and event.dataset fields.

Oh, also base fields like labels vs tags. I like the labels field, but the examples of labels and tags confused me.
labels
example: {'application': 'foo-bar', 'env': 'production'}
tags
example: ["production", "env2"]
Should I use labels.env or add environment values as tags? It depends? Since ECS is a specification, the interpretation and implementation should be consistent as much as possible.

Another good point. We need to provide better documentation to make it clear that these are for custom use. Which means that you can use them for environment-specific purposes. If you need KV pairs, use labels, if you need just keywords, use tags. ECS-compatible analysis content meant to work across all ECS environments should not consume or depend upon values in these fields.

@DeathsPirate @Randy-312 I like the sound of a working group, but what specifically did you have in mind? In many ways this open source GitHub repo is indeed an (albeit asynchronous) working group. Were you thinking of live IRC-based session on a periodic basis? I'd want to be cautions to make sure any additional forum was open and available to all our contributors and community members.

Again, thank you all for your thoughtful input and offers to help make ECS simpler.

@davidhowell-tx
Copy link

Any ideas on when some additional clarification will be available?

@ansoni
Copy link

ansoni commented Jan 14, 2020

So I spent the last hour trying to map custom application api logs into ECS. It is extremely frustrating. The lack of concrete use-cases hurts ECS IMO. I would like to see some common references like:

  • lb-f5
  • lb-aws-alb
  • lb-aws-network
  • restful-api

This will make it easier to pick a "similar" use-case when mapping in a new type.

So far, I haven't found fields for:

  • Cloud Account Alias - used cloud.account.alias (I opened a previous issue on this, but didn't have the bandwidth to follow-up)
  • Customer Name - used customer.name
  • Customer Support Level - used customer.support_level
  • Session Id - used session.id
  • Environment - I used environment.name
  • API fields - use event?

As it stands right now, I am either overloading everything into event or I am creating custom fields since ECS hasn't given an opinion. It works fine if it is just me, but this becomes a problem if I ask other people to fix their datasets. With ECS being incomplete, I really can't point anyone to it and expect consistent, normalized results.

@djptek
Copy link
Contributor

djptek commented Jul 13, 2021

Thanks for all your input on this, the ECS Schema & Docs have evolved and Matured, addressing many of these concerns, and from this point we would consider specific fixes to individual fields rather than wider changes with potential for reverse compatibility issues.

Please feel free to re-open if you have new input

@djptek djptek closed this as completed Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants