Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured audit logging #31931

Merged

Conversation

albertzaharovits
Copy link
Contributor

@albertzaharovits albertzaharovits commented Jul 10, 2018

As it now stands logfile security auditing is almost structured. Most fields are of the form key=[value] and there is no text in between the entries (besides the comma and space delimiters). If all values where of the form key=[value] then such entries would count as 'structured'. Examples of such values, which are not prepended by the field name, are: date, host name, node name, layer: rest or transport.

This PR builds the log event as a Map, and lets log4j's internals (precisely the layout assigned to the appender) to handle the actual printing format. This is better because we could have many appenders for the same log each with a particular layout from: JSON, XML or syslog (all structured) to custom ones to get the desired human friendly templated format. Even better: we could change appenders and layouts at runtime, without touching the code, from the log4j conf file.

It is WIP because tests still need fixing.

Closes: #31046

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security

@ruflin
Copy link
Member

ruflin commented Sep 7, 2018

  • Port: +1 on removing port for now. Adding a field later is easy, removing it is kind of a breaking change.
  • Node.id: Having it by default SGTM as I assume that is what we will use for correlation.
  • Documented fields: This is really nice. Is this manually generated? If possible this should end up in the docs. It's one of the big issues with most logs that there are no docs on what will be in the docs itself.
  • origin: Not in ECS yet but I like the usage and I don't see a conflict coming here.
  • realm: Should not conflict as long as we don't have a realm object.
  • action: As this is very specific to Elasticsearch, I would put it under elasticsearch.action. But it's something we could also do during ingest time.
  • indices: What about elasticsearch.index.name: No need for plural as in ES it's always an array. Same as for action, we can also handle it during ingest time.
  • Same as for indices adn action also applies to rule and opaque_id but we can / should do that at ingest time.

@albertzaharovits Do you seen an issue that Filebeat would still do some potential transformations on the event during ingest time? We do similar things when we fetch metrics and prefix it with the service to prevent conflicts. It would not make sense in an Elasticsearch API to prefix everything and I think the same is true for a log.

@albertzaharovits
Copy link
Contributor Author

albertzaharovits commented Sep 13, 2018

Alrighty, based on the feedback from @ruflin I have:

  • removed the port in the "host.ip" field
  • kept the port on the "origin.address" field
  • added "node.id" which defaults to printed (audit.logfile.prefix.emit_node_id: true)
  • "node.name" is no longer printed implicitly (audit.logfile.prefix.emit_node_name: false)
  • removed "event.category": "elasticsearch-audit" because this is a constant field
    that should be added during ingest; and it's better for the human to keep it brief.

Documented fields: This is really nice. Is this manually generated? If possible this should end up in the docs. It's one of the big issues with most logs that there are no docs on what will be in the docs itself.

Yep, they're handcrafted. They will definitely be in the documentation (there is already documentation for it, but it is stale).

Do you seen an issue that Filebeat would still do some potential transformations on the event during ingest time? We do similar things when we fetch metrics and prefix it with the service to prevent conflicts. It would not make sense in an Elasticsearch API to prefix everything and I think the same is true for a log.

No, I don't take any issue with this.

indices: What about elasticsearch.index.name: No need for plural as in ES it's always an array. Same as for action, we can also handle it during ingest time.
Same as for indices adn action also applies to rule and opaque_id but we can / should do that at ingest time.

I think all top level fields which are not part of ECS have to/should be prepended during ingest with the appropriate namespace.

Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two minor comments. Otherwise LGTM!

@@ -1554,7 +1566,7 @@ public void testIndicesFilter() throws Exception {
}

private <T> List<T> randomListFromLengthBetween(List<T> l, int min, int max) {
assert (min >= 0) && (min <= max) && (max <= l.size());
assert min >= 0 && min <= max && max <= l.size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep the parentheses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though I can spot when Eclipse does these check while validating the commit diff, but it looks like I can't. I have removed the check. I have long entertained the idea of using vim and things like this are only stoking it...

Copy link
Contributor

@bizybot bizybot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for iterations, LGTM.

@albertzaharovits
Copy link
Contributor Author

@ruflin I am going to merge this. There are no outstanding moot points, I have addressed them all.
If new ideas sprout on how to enhance the new log format I think they suit in new issues.

@albertzaharovits albertzaharovits merged commit c86e2d5 into elastic:master Sep 14, 2018
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Sep 14, 2018
* master: (24 commits)
  Only notify ready global checkpoint listeners (elastic#33690)
  Don't count hits via the collector if the hit count can be computed from index stats. (elastic#33701)
  Expose retries for CCR fetch failures (elastic#33694)
  Test fix - Graph vertices could appear in different orders based on map insertion sequence (elastic#33709)
  Structured audit logging (elastic#31931)
  Core: Add DateFormatter interface for java time parsing (elastic#33467)
  [CCR] Check whether the rejected execution exception has the shutdown flag set (elastic#33703)
  Mute ClusterDisruptionIT#testSendingShardFailure
  Revert "Mute FullClusterRestartSettingsUpgradeIT"
  Adjust BWC version on settings upgrade test (elastic#33650)
  [ML] Allow overrides for some file structure detection decisions (elastic#33630)
  Adapt skip version for doc_values format deprecation
  [TEST] wait for no initializing shards
  [Docs] Minor fix in `has_child` javadoc comment (elastic#33674)
  Mute FullClusterRestartSettingsUpgradeIT
  [Kerberos] Add realm name & UPN to user metadata (elastic#33338)
  [TESTS] Disable specific locales for RestrictedTrustManagerTest (elastic#33299)
  SQL: Return functions in JDBC driver metadata (elastic#33672)
  SCRIPTING: Move terms_set Context to its Own Class (elastic#33602)
  AwaitsFix testRestoreMinmal
  ...
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. I think we will figure out the small nitty gritty details as soon as we try to add it to the Filebeat module.

@ycombinator FYI

@albertzaharovits albertzaharovits deleted the structured-audit-logging branch October 17, 2018 17:00
albertzaharovits added a commit that referenced this pull request Oct 26, 2018
Documents the new structured logfile format for auditing
that was introduced by #31931. Most changes herein
are for 6.x . In 7.0 the deprecated format is gone and a
follow-up PR is in order.
albertzaharovits added a commit that referenced this pull request Oct 26, 2018
Documents the new structured logfile format for auditing
that was introduced by #31931. Most changes herein
are for 6.x . In 7.0 the deprecated format is gone and a
follow-up PR is in order.
albertzaharovits added a commit that referenced this pull request Oct 26, 2018
Documents the new structured logfile format for auditing
that was introduced by #31931. Most changes herein
are for 6.x . In 7.0 the deprecated format is gone and a
follow-up PR is in order.
kcm pushed a commit that referenced this pull request Oct 30, 2018
Documents the new structured logfile format for auditing
that was introduced by #31931. Most changes herein
are for 6.x . In 7.0 the deprecated format is gone and a
follow-up PR is in order.
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Audit logging structured entries
7 participants