Create a section to explain some conventions used in ECS. Explain 2 c…

…onventions. (#89) Explaining the multi-fields convention will help keep each field description more to the point. Right now there's an inconsistent quick explainer in many of the multi-fields, which is redundant. Having this section to explain it a bit better will let us clean these field descriptions up later. I also like that we're already using `keyword` for most IDs (I haven't checked them all out). But it's something I would have pushed for if that hadn't been the case already.
elastic · Aug 20, 2018 · e29b91c · e29b91c
1 parent 0348352
commit e29b91c
Show file tree

Hide file tree

Showing 2 changed files with 88 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -480,6 +480,50 @@ Contributions of additional uses cases on top of ECS are welcome.
 * *Use prefixes.* Fields must be prefixed except for the base fields. For example all `host` fields are prefixed with `host.`. See `dot` notation in FAQ for more details.
 * Do not use abbreviations. (A few exceptions like `ip` exist.)
 
+## Understanding ECS conventions
+
+### Multi-fields text indexing
+
+ElasticSearch can index text multiple ways:
+
+* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) indexing allows for full text search, or searching arbitrary words that
+  are part of the field.
+* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) indexing allows for much faster
+  [exact match](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html)
+  and [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html),
+  and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html)
+  (what Kibana visualizations are built on).
+
+In some cases, only one type of indexing makes sense for a field.
+
+However there are cases where both types of indexing can be useful, and we want
+to index both ways.
+As an example, log messages can sometimes be short enough that it makes sense
+to sort them by frequency (that's an aggregation). They can also be long and
+varied enough that full text search can be useful on them.
+
+Whenever both types of indexing are helpful, we use multi-fields indexing. The
+convention used is the following:
+
+* `foo`: `text` indexing.
+  The top level of the field (its plain name) is used for full text search.
+* `foo.raw`: `keyword` indexing.
+  The nested field has suffix `.raw` and is what you will use for aggregations.
+  * Performance tip: when filtering your stream in Kibana (or elsewhere), if you
+    are filtering for an exact match or doing a prefix search,
+    both `text` and `keyword` field can be used, but doing so on the `keyword`
+    field (named `.raw`) will be much faster and less memory intensive.
+
+**Keyword only fields**
+
+The fields that only make sense as type `keyword` are not named `foo.raw`, the
+plain field (`foo`) will be of type `keyword`, with no nested field.
+
+### IDs are keywords not integers
+
+Despite the fact that IDs are often integers in various systems, this is not
+always the case. Since we want to make it possible to map as many data sources
+to ECS as possible, we default to using the `keyword` type for IDs.
 
 # <a name="about-ecs"></a>FAQ
 

diff --git a/docs/implementing.md b/docs/implementing.md
@@ -22,3 +22,47 @@
 * *Use prefixes.* Fields must be prefixed except for the base fields. For example all `host` fields are prefixed with `host.`. See `dot` notation in FAQ for more details.
 * Do not use abbreviations. (A few exceptions like `ip` exist.)
 
+## Understanding ECS conventions
+
+### Multi-fields text indexing
+
+ElasticSearch can index text multiple ways:
+
+* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) indexing allows for full text search, or searching arbitrary words that
+  are part of the field.
+* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) indexing allows for much faster
+  [exact match](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html)
+  and [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html),
+  and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html)
+  (what Kibana visualizations are built on).
+
+In some cases, only one type of indexing makes sense for a field.
+
+However there are cases where both types of indexing can be useful, and we want
+to index both ways.
+As an example, log messages can sometimes be short enough that it makes sense
+to sort them by frequency (that's an aggregation). They can also be long and
+varied enough that full text search can be useful on them.
+
+Whenever both types of indexing are helpful, we use multi-fields indexing. The
+convention used is the following:
+
+* `foo`: `text` indexing.
+  The top level of the field (its plain name) is used for full text search.
+* `foo.raw`: `keyword` indexing.
+  The nested field has suffix `.raw` and is what you will use for aggregations.
+  * Performance tip: when filtering your stream in Kibana (or elsewhere), if you
+    are filtering for an exact match or doing a prefix search,
+    both `text` and `keyword` field can be used, but doing so on the `keyword`
+    field (named `.raw`) will be much faster and less memory intensive.
+
+**Keyword only fields**
+
+The fields that only make sense as type `keyword` are not named `foo.raw`, the
+plain field (`foo`) will be of type `keyword`, with no nested field.
+
+### IDs are keywords not integers
+
+Despite the fact that IDs are often integers in various systems, this is not
+always the case. Since we want to make it possible to map as many data sources
+to ECS as possible, we default to using the `keyword` type for IDs.