Skip to content

Commit

Permalink
Create a section to explain some conventions used in ECS. Explain 2 c…
Browse files Browse the repository at this point in the history
…onventions. (#89)

Explaining the multi-fields convention will help keep each field description more to the point. Right now there's an inconsistent quick explainer in many of the multi-fields, which is redundant. Having this section to explain it a bit better will let us clean these field descriptions up later.

I also like that we're already using `keyword` for most IDs (I haven't checked
them all out). But it's something I would have pushed for if that hadn't
been the case already.
  • Loading branch information
webmat authored and ruflin committed Aug 20, 2018
1 parent 0348352 commit e29b91c
Show file tree
Hide file tree
Showing 2 changed files with 88 additions and 0 deletions.
44 changes: 44 additions & 0 deletions README.md
Expand Up @@ -480,6 +480,50 @@ Contributions of additional uses cases on top of ECS are welcome.
* *Use prefixes.* Fields must be prefixed except for the base fields. For example all `host` fields are prefixed with `host.`. See `dot` notation in FAQ for more details.
* Do not use abbreviations. (A few exceptions like `ip` exist.)

## Understanding ECS conventions

### Multi-fields text indexing

ElasticSearch can index text multiple ways:

* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) indexing allows for full text search, or searching arbitrary words that
are part of the field.
* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) indexing allows for much faster
[exact match](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html)
and [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html),
and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html)
(what Kibana visualizations are built on).

In some cases, only one type of indexing makes sense for a field.

However there are cases where both types of indexing can be useful, and we want
to index both ways.
As an example, log messages can sometimes be short enough that it makes sense
to sort them by frequency (that's an aggregation). They can also be long and
varied enough that full text search can be useful on them.

Whenever both types of indexing are helpful, we use multi-fields indexing. The
convention used is the following:

* `foo`: `text` indexing.
The top level of the field (its plain name) is used for full text search.
* `foo.raw`: `keyword` indexing.
The nested field has suffix `.raw` and is what you will use for aggregations.
* Performance tip: when filtering your stream in Kibana (or elsewhere), if you
are filtering for an exact match or doing a prefix search,
both `text` and `keyword` field can be used, but doing so on the `keyword`
field (named `.raw`) will be much faster and less memory intensive.

**Keyword only fields**

The fields that only make sense as type `keyword` are not named `foo.raw`, the
plain field (`foo`) will be of type `keyword`, with no nested field.

### IDs are keywords not integers

Despite the fact that IDs are often integers in various systems, this is not
always the case. Since we want to make it possible to map as many data sources
to ECS as possible, we default to using the `keyword` type for IDs.

# <a name="about-ecs"></a>FAQ

Expand Down
44 changes: 44 additions & 0 deletions docs/implementing.md
Expand Up @@ -22,3 +22,47 @@
* *Use prefixes.* Fields must be prefixed except for the base fields. For example all `host` fields are prefixed with `host.`. See `dot` notation in FAQ for more details.
* Do not use abbreviations. (A few exceptions like `ip` exist.)

## Understanding ECS conventions

### Multi-fields text indexing

ElasticSearch can index text multiple ways:

* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) indexing allows for full text search, or searching arbitrary words that
are part of the field.
* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) indexing allows for much faster
[exact match](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html)
and [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html),
and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html)
(what Kibana visualizations are built on).

In some cases, only one type of indexing makes sense for a field.

However there are cases where both types of indexing can be useful, and we want
to index both ways.
As an example, log messages can sometimes be short enough that it makes sense
to sort them by frequency (that's an aggregation). They can also be long and
varied enough that full text search can be useful on them.

Whenever both types of indexing are helpful, we use multi-fields indexing. The
convention used is the following:

* `foo`: `text` indexing.
The top level of the field (its plain name) is used for full text search.
* `foo.raw`: `keyword` indexing.
The nested field has suffix `.raw` and is what you will use for aggregations.
* Performance tip: when filtering your stream in Kibana (or elsewhere), if you
are filtering for an exact match or doing a prefix search,
both `text` and `keyword` field can be used, but doing so on the `keyword`
field (named `.raw`) will be much faster and less memory intensive.

**Keyword only fields**

The fields that only make sense as type `keyword` are not named `foo.raw`, the
plain field (`foo`) will be of type `keyword`, with no nested field.

### IDs are keywords not integers

Despite the fact that IDs are often integers in various systems, this is not
always the case. Since we want to make it possible to map as many data sources
to ECS as possible, we default to using the `keyword` type for IDs.

0 comments on commit e29b91c

Please sign in to comment.