Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark which fields should be arrays. #727

Merged
merged 23 commits into from Feb 6, 2020
Merged

Conversation

webmat
Copy link
Contributor

@webmat webmat commented Jan 17, 2020

This PR builds on top of #661 and identifies many fields that should be arrays.

Which field should be an array has been determined by observing usage in Beats
or elsewhere, and also by extending the same thinking to additional fields.
Feedback on which fields this applies to is welcome.

The implementation is pretty simple. The field definition can optionally contain
an array attribute called "normalize", e.g.

    - name: tags
      level: core
      type: keyword
      example: "[\"production\", \"env2\"]"
      description: >
        List of keywords used to tag each event.
      normalize:
        - array

Eventually "normalize" could contain additional kinds of desired normalizations
that should be applied to the field (e.g. uppercase, lowercase, etc.) but for now,
this only adds support for "array".

This attribute is passed along to the intermediary YAML files, and is displayed in asciidoc as a "Note:" below the type information. Look at the tags field in the docs preview, for an example. We can improve on this later, when we tackle a redesign of the field details page (details).

Closes #661

cc @codebrain, @ruflin, @rw-access, @MikePaquette

@webmat webmat self-assigned this Jan 17, 2020
@webmat
Copy link
Contributor Author

webmat commented Jan 17, 2020

I'm adding a column to the CSV in order to make it easier to see where the normalization is being added. Although the diff is still difficult to review, because each line is being modified (with the additional comma). The best way to review them is to view the csv rendering in the branch here, and filter for the word "array".

So far, they are:

  • tags
  • container.image.tag
  • dns.answers
  • dns.header_flags
  • dns.resolved_ip
  • event.category
  • event.type
  • file.attributes
  • host.ip
  • host.mac
  • observer.ip
  • observer.mac
  • process.args
  • process.parent.args
  • related.hash
  • related.ip
  • related.user
  • threat.tactic.id
  • threat.tactic.name
  • threat.tactic.reference
  • threat.technique.id
  • threat.technique.name
  • threat.technique.reference
  • tls.client.certificate_chain
  • tls.client.supported_ciphers
  • tls.server.certificate_chain
  • vulnerability.category

Note: in this PR I'm changing the definition of user.id. It used to refer to "one or multiple IDs" for the user, implicitly saying this should be an array. However I don't think it's common for a given system to have multiple IDs for the same user, am I mistaken here, @ruflin? If it was phrased this way in order to allow for correlation between the same person's user IDs in different systems, in my opinion it would be better to do this via other fields, or other events. Not by concatenating multiple IDs in this field.

Please let me know if you disagree with this change to user.id.

codebrain added a commit to elastic/ecs-dotnet that referenced this pull request Jan 29, 2020
codebrain added a commit to elastic/ecs-dotnet that referenced this pull request Jan 29, 2020
@webmat
Copy link
Contributor Author

webmat commented Jan 31, 2020

@MikePaquette Do you agree with marking the following fields as arrays?

@adriansr Is this in line with what Beats has been doing?

@rw-access or @andrewstucki Anything from the Endpoint side on marking these as arrays? And Ross, does that give you what you need?

Let's get this closed out beginning of next week.

@andrewstucki
Copy link
Contributor

@webmat this generally looks good to me, I believe the only fields that really affect Endpoint in the short term are some of the following and I believe we already had those documented as arrays (in at least the example sections) in our field schemas:

dns.answers
dns.resolved_ip
event.category
event.type
file.attributes
host.ip
host.mac

This should definitely help with some clarity around what we're expecting to have a single value v. multiple values.

Copy link
Contributor

@MikePaquette MikePaquette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webmat this is great.
The list of fields looks good.
My only question is whether "normalize" is the best name for the column header? It's not intuitive to me that it's where I'd look for whether a field is expected to be an array. Anyhow I am OK with it as is, but if there were a more intuitive column name, I'd be +1 for that.

@webmat
Copy link
Contributor Author

webmat commented Feb 5, 2020

@MikePaquette Yeah the name "normalize" is a bit generic. The reason for this is that I think it would eventually make sense to document more ways in which fields are expected to be normalized. E.g. forcing lowercase, coercing to an integer, to a float, etc. I think the word normalize makes sense in the YAML, as driving all of this.

However I don't mind adjusting the CSV column name. The single "Normalization" column would eventually contain a list of normalizations, which wouldn't be easy to use in a spreadsheet anyway. Perhaps we can address that in a subsequent PR, when we start listing those other ways of normalizing. WDYT?

@adriansr
Copy link
Contributor

adriansr commented Feb 6, 2020

This is the way to go for Beats, once the generator is updated to understand this normalize flag, the generated Go code will allow us to use arrays.

Currently we're either not using the generated ECS types at all, or doing some workaround and added some code that breaks the build once arrays are supported so that we're forced to update. See:

https://github.com/elastic/beats/blob/b1c6041fab4735563edc43f1cdbece7d6d3814de/packetbeat/protos/tls/tls.go#L448-L465

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants