Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to the ElasticSearch convention of using ".keyword" to denote nested fields indexed with type "keyword" #103

Closed
5 tasks done
webmat opened this issue Aug 23, 2018 · 2 comments
Assignees

Comments

@webmat
Copy link
Contributor

webmat commented Aug 23, 2018

The list of fields at the time of writing this:

  • file.path.raw => file.path.keyword
  • file.target_path.raw => file.target_path.keyword
  • url.href.raw => url.href.keyword
  • url.path.raw => url.path.keyword
  • url.query.raw => url.query.keyword
@ruflin
Copy link
Member

ruflin commented Aug 24, 2018

Could you perhaps add a bit more details to the issue description on why we decided for keyword so in case the discussion pops up in the future again we can point to this issue?

@webmat webmat mentioned this issue Sep 18, 2018
26 tasks
@webmat
Copy link
Contributor Author

webmat commented Sep 19, 2018

Sorry, I missed your comment while on vacation :-)

Here's the background on why we decided to do these renames. We had a discussion on two fronts:

  • do we use the older convention of using .raw as a nested field for keyword indexing in ECS or do we go with the newer convention and ElasticSearch' current default of naming them .keyword?
  • issue Inconsistent usage of .raw, and discussion about .raw vs .keyword naming #87 also brought up a few fields that were named .raw, meaning "untouched/original", unrelated to the convention of using keyword indexing there (one happened to be keyword, the other one was text).

We decided on using the .keyword convention since it was the new convention, ElasticSearch's default. This will mean less friction in the future since most training and documentation material is now using this convention.

We decided to avoid clashes with between the old raw convention and common parlance by using the word original to describe untouched fields, instead of using raw (a few fields were addressed this way in #107 and #106).

Also related to this issue is doing a pass on the other current textual fields, to see if any other should be made multi-fields (#104). The current approach is to index textual fields with keyword indexing only, so moving from this to multi-field is a breaking change, since the top field moves from keyword to text datatype. Hence the importance of doing a pass on this before ECS goes GA.

webmat pushed a commit to webmat/ecs that referenced this issue Sep 19, 2018
webmat pushed a commit to webmat/ecs that referenced this issue Sep 19, 2018
The field that actually started this whole discussion had been forgotten from elastic#87 and elastic#103 :-)
webmat pushed a commit to webmat/ecs that referenced this issue Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants