Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fileset for parsing linux auditd logs #3750

Merged
merged 4 commits into from Apr 3, 2017

Conversation

Projects
None yet
4 participants
@andrewkroh
Copy link
Member

commented Mar 15, 2017

Update: Module renamed in #3941.

The PR adds and audit fileset to the system module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data.

Features

  • Parses audit event type, unix epoch time, audit event counter, and the arbitrary key/value pairs that follow.
  • Applies geoip lookup to the system.audit.addr field which is present for some remote login events.
  • Decodes hex encoded ascii values that are sometimes used for the system.audit.exe and system.audit.cmd fields.
  • Remove key/value pairs where the value is ?.

Missing Features

  • Decoder for system.audit.saddr field present in SOCKADDR audit events. A recipe is described here and could probably be ported to painless. Sample value:
    type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F
  • Syscall number to name conversion. This would require an architecture specific mapping table.
  • Join data from events with the same auditd counter value. For example there are separate log lines for SYSCALL, CWD, and PATH audit types that all correspond to the same overall action.

Filebeat Audit Dashboard

@andrewkroh andrewkroh force-pushed the andrewkroh:feature/filebeat-audit-fileset branch 2 times, most recently from 3e91ac1 to 8794516 Mar 16, 2017

@andrewkroh andrewkroh added review and removed in progress labels Mar 16, 2017

@andrewkroh andrewkroh force-pushed the andrewkroh:feature/filebeat-audit-fileset branch from 8794516 to a5ef7ae Mar 16, 2017

@andrewkroh andrewkroh requested a review from tsg Mar 16, 2017

@ruflin

ruflin approved these changes Mar 17, 2017

fcfg, err = applyOverrides(fcfg, mcfg.Module, filesetName, overrides)
if err != nil {
return nil, fmt.Errorf("Error applying overrides on fileset %s/%s: %v", mcfg.Module, filesetName, err)
}

if fcfg.Enabled != nil && (*fcfg.Enabled) == false {

This comment has been minimized.

Copy link
@ruflin

ruflin Mar 17, 2017

Collaborator

Is there a specific reason this was moved?

This comment has been minimized.

Copy link
@andrewkroh

andrewkroh Mar 17, 2017

Author Member

This applies the overrides first, which allows an override to disable a fileset. It allows a CLI config like:

-M system.audit.enabled=false

{
"script": {
"lang": "painless",
"inline": " String trimQuotes(def v) {\n if (v.startsWith(\"'\") || v.startsWith('\"')) {\n v = v.substring(1, v.length());\n }\n if (v.endsWith(\"'\") || v.endsWith('\"')) {\n v = v.substring(0, v.length()-1);\n } \n return v;\n }\n \n boolean isHexAscii(String v) {\n def len = v.length();\n if (len == 0 || len % 2 != 0) {\n return false; \n }\n \n for (int i = 0 ; i < len ; i++) {\n if (Character.digit(v.charAt(i), 16) == -1) {\n return false;\n }\n }\n\n return true;\n }\n \n String convertHexToString(String hex) {\n\t StringBuilder sb = new StringBuilder();\n\n for (int i=0; i < hex.length() - 1; i+=2) {\n String output = hex.substring(i, (i + 2));\n int decimal = Integer.parseInt(output, 16);\n sb.append((char)decimal);\n }\n\n return sb.toString();\n }\n \n def possibleHexKeys = ['exe', 'cmd'];\n \n def audit = ctx.system.get(\"audit\");\n Iterator entries = audit.entrySet().iterator();\n while (entries.hasNext()) {\n def e = entries.next();\n def k = e.getKey();\n def v = e.getValue(); \n\n // Remove entries whose value is ?\n if (v == \"?\") {\n entries.remove();\n continue;\n }\n \n // Convert hex values to ASCII.\n if (possibleHexKeys.contains(k) && isHexAscii(v)) {\n v = convertHexToString(v);\n audit.put(k, v);\n }\n \n // Trim quotes.\n if (v instanceof String) {\n v = trimQuotes(v);\n audit.put(k, v);\n }\n \n // Convert arch.\n if (k == \"arch\" && v == \"c000003e\") {\n audit.put(k, \"x86_64\");\n }\n }\n \n // Fix inner msg parsing.\n def msg = audit.get(\"msg\");\n if (msg != null) {\n def i = msg.indexOf(\"=\");\n if (i >= 0) {\n def newKey = trimQuotes(msg.substring(0, i));\n def newValue = trimQuotes(msg.substring(i+1, msg.length()));\n audit.remove(\"msg\");\n audit.put(newKey, newValue);\n }\n }"

This comment has been minimized.

Copy link
@ruflin

ruflin Mar 17, 2017

Collaborator

That is quite a script. I'm curious if that has a major impact on ingestion speed.

This comment has been minimized.

Copy link
@andrewkroh

andrewkroh Mar 17, 2017

Author Member

I haven't measured in any way the impact on ingestion rates.

This comment has been minimized.

Copy link
@andrewkroh

andrewkroh Mar 17, 2017

Author Member

I was able to remove a small part of the script by changing the grok pattern (just pushed it), but I think it is as simple as I can make it.

andrewkroh added some commits Mar 15, 2017

Add fileset for parsing linux auditd logs
The PR adds and audit fileset to the system module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data.

Features

- Parses audit event type, unix epoch time, audit event counter, and the arbitrary key/value pairs that follow.
- Applies geoip lookup to the `system.audit.addr` field which is present for some remote login events.
- Decodes hex encoded ascii values that are sometimes used for the `system.audit.exe` and `system.audit.cmd` fields.
- Remove key/value pairs where the value is `?`.

Missing Features

- Decoder for `system.audit.saddr` field present in SOCKADDR audit events. A recipe is described [here](https://unix.stackexchange.com/questions/102926/how-to-interpret-the-saddr-field-of-an-audit-log) and could probably be ported to painless. Sample value:
  `type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F`
Apply fileset overrides before checking enabled
Disable other filesets in a module while testing.

@andrewkroh andrewkroh force-pushed the andrewkroh:feature/filebeat-audit-fileset branch from a5ef7ae to c945fb7 Mar 17, 2017

@ruflin

This comment has been minimized.

Copy link
Collaborator

commented Mar 20, 2017

LGTM. Waiting for @tsg to have a look.

assert "error" not in obj

if module != "system" and fileset != "audit":
# There are dynamic fields in audit logs that are not documented.

This comment has been minimized.

Copy link
@tsg

tsg Mar 23, 2017

Collaborator

Are the dynamic fields grouped under a sub-directory? Then we could document it as a dict type?

This comment has been minimized.

Copy link
@andrewkroh

andrewkroh Mar 23, 2017

Author Member

All of the fields are written under system.audit.* with no inner dictionaries. Is there anything I should do in this case?

This comment has been minimized.

Copy link
@tsg

tsg Mar 23, 2017

Collaborator

Would it be possible / make sense to group them under a sub-dict? I think we don't have a precedent in any Beat so far that dynamic and static fields are mixed together.

This comment has been minimized.

Copy link
@andrewkroh

andrewkroh Mar 23, 2017

Author Member

It's possible, but I don't think using a sub-dict in this case provides any additional context to the user about the fields. We would be using sub-dict here due to the implementation details (like the fact that it uses a kv processor instead of a specific grok pattern for each audit record type). From a user perspective I don't think a sub-dict would add value (at least not with what I'm thinking -- system.audit.kv.*), but I don't like to be the one breaking precedent.

This comment has been minimized.

Copy link
@ruflin

ruflin Mar 25, 2017

Collaborator

The main thing I worry is that a dynamic field could overwrite a static field and conflict on the type. Could we perhaps just have one entry with all the fields as a string? Or use the raw namespace? Also not sure yet how to deal with it.

This comment has been minimized.

Copy link
@andrewkroh

andrewkroh Mar 31, 2017

Author Member

If there was an exclude_keys option in the KV processor then it would easy to prevent the dynamic fields from overwriting the well-known fields from the header. The Logstash KV filter has this option so I'll ask about getting this in ES. I could enhance this module to use that once it's available.

There are only a few keys in the audit header that could be overwritten (record_type and sequence) by KV data and I haven't observed these being used in any audit events. I think the risk is minimal.

module, obj["fileset"]["module"])

if not (module == "mysql" and fileset == "slowlog") and not (module == "system" and fileset == "auth"):
# TODO: There are errors parsing the test logs from these modules.

This comment has been minimized.

Copy link
@tsg

tsg Mar 23, 2017

Collaborator

Yeah, this is not ideal, but I don't have an easy solution for now.

This comment has been minimized.

Copy link
@andrewkroh

andrewkroh Mar 23, 2017

Author Member

I agree, definitely not idea, but I really wanted to enable error checking. Maybe as part of the manifest or some file in the test directory the fileset could self-describe its testing parameters (e.g. allow_errors: true, has_dynamic_fields: true).

This comment has been minimized.

Copy link
@tsg

tsg Mar 23, 2017

Collaborator

Yep, we probably need something like that. I'm good with the current solution for the time being.

=== Compatibility

This module was tested with logs from OSes like Ubuntu 12.04, Centos 7, macOS Sierra, and others.
This module was tested with logs from OSes like Ubuntu 12.04, Centos 7, macOS
Sierra, and others.

This comment has been minimized.

Copy link
@tsg

tsg Mar 23, 2017

Collaborator

The "and others" might come back to us as an issue in support, maybe it's better to leave it out.

This comment has been minimized.

Copy link
@andrewkroh

andrewkroh Mar 23, 2017

Author Member

I agree, probably a good idea to drop that. Will do.

"field": "system.audit.sub_kv",
"ignore_failure": true
}
},

This comment has been minimized.

Copy link
@tsg

tsg Mar 23, 2017

Collaborator

In the other filesets so far we remove the original message key, to avoid duplication. For consistency, we might want to do it here as well, unless there's a lot of value in keeping it?

This comment has been minimized.

Copy link
@andrewkroh

andrewkroh Mar 23, 2017

Author Member

One reason I like having the message field is because it gets analyzed and make searching simpler when I don't know exactly what I'm looking for (helps with _all being deprecated). WDYT?

This comment has been minimized.

Copy link
@tsg

tsg Mar 23, 2017

Collaborator

I was always on the fence (for each new module) about keeping the message in, but thought I'd try it without it first, since it's easier to add than remove. My understanding is that by default Kibana will still search in all fields, even without having an explicit _all.

What you think if we leave it out here again for consistency, but we reconsider before 6.0 if we want to add it everywhere?

This comment has been minimized.

Copy link
@andrewkroh

andrewkroh Mar 23, 2017

Author Member

SGTM

This comment has been minimized.

Copy link
@ruflin

ruflin Mar 25, 2017

Collaborator

I like to not have the raw message in by default. But it can become useful in some case, for example for debugging. We could introduce a config option which can be set per module to turn it on. Problem is that this would mean a change in the processor which should not change based on config options.

This comment has been minimized.

Copy link
@andrewkroh

andrewkroh Mar 31, 2017

Author Member

Removed the message field in ececd94.

},
{
"convert": {
"field" : "system.audit.item",

This comment has been minimized.

Copy link
@tsg

tsg Mar 23, 2017

Collaborator

Interesting, are these conversions needed? In the other places I have assumed that if we set it as a number in the mapping template, that's enough for ES to do the conversion itself.

This comment has been minimized.

Copy link
@andrewkroh

andrewkroh Mar 23, 2017

Author Member

You're right, I don't think these are strictly necessary. ES should do the conversion.

The only difference that I can possibly think of here is how errors would be handled. If the conversion fails in the ingest processor the event is still written to ES, but has an error field and is not completely enriched. If a failure occurs during automatic conversion in ES then you get an mapping exception and the whole document gets dropped.

This comment has been minimized.

Copy link
@tsg

tsg Mar 23, 2017

Collaborator

Good point, I agree it's worth keeping them in this case.

"AUDIT_KEY_VALUES": "%{WORD}=%{GREEDYDATA}"
},
"patterns": [
"%{AUDIT_PREFIX} %{AUDIT_KEY_VALUES:system.audit.kv} old auid=%{NUMBER:system.audit.old_auid} new auid=%{NUMBER:system.audit.new_auid} old ses=%{NUMBER:system.audit.old_ses} new ses=%{NUMBER:system.audit.new_ses}",

This comment has been minimized.

Copy link
@tsg

tsg Mar 23, 2017

Collaborator

Might be worth documenting old_auid, old_ses, etc?

This comment has been minimized.

Copy link
@andrewkroh

andrewkroh Mar 23, 2017

Author Member

I'll add those.

This comment has been minimized.

Copy link
@andrewkroh

andrewkroh Mar 31, 2017

Author Member

Added in 7cd7d52

@tsg

tsg approved these changes Mar 23, 2017

Copy link
Collaborator

left a comment

This is great, sorry for the late review. I left a few comments. This would also be worth a quick blog post like the one I did on the auth logs.

andrewkroh added some commits Mar 31, 2017

Addressing review comments
- Renamed system.audit.type to system.audit.record_type
- Renamed system.audit.counter to system.audit.sequence
- Dropped system.audit.epoch after parsing it into @timestamp
- Dropped message field.
- Removed “and others” from the supported OSes listed in docs.
- Updated dashboards after renaming and deleting fields.
- Updated script to drop fields where value was `(null)`.

@andrewkroh andrewkroh removed the in progress label Mar 31, 2017

@tsg tsg merged commit e5f5bef into elastic:master Apr 3, 2017

5 of 6 checks passed

codecov/patch 50% of diff hit (target 65.51%)
Details
CLA Commit author has signed the CLA
Details
codecov/project 65.84% (+0.32%) compared to 6529638
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
default Build finished.
Details

andrewkroh added a commit to andrewkroh/beats that referenced this pull request Apr 7, 2017

Add fileset for parsing linux auditd logs (elastic#3750)
The PR adds and audit fileset to the system module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data.

Features

- Parses audit event type, unix epoch time, audit event counter, and the arbitrary key/value pairs that follow.
- Applies geoip lookup to the `system.audit.addr` field which is present for some remote login events.
- Decodes hex encoded ascii values that are sometimes used for the `system.audit.exe` and `system.audit.cmd` fields.
- Remove key/value pairs where the value is `?`.

Missing Features

- Decoder for `system.audit.saddr` field present in SOCKADDR audit events. A recipe is described [here](https://unix.stackexchange.com/questions/102926/how-to-interpret-the-saddr-field-of-an-audit-log) and could probably be ported to painless. Sample value:
  `type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F`

* Apply fileset overrides before checking enabled

Disable other filesets in a module while testing.

* Addressing review comments

- Renamed system.audit.type to system.audit.record_type
- Renamed system.audit.counter to system.audit.sequence
- Dropped system.audit.epoch after parsing it into @timestamp
- Dropped message field.
- Removed “and others” from the supported OSes listed in docs.
- Updated dashboards after renaming and deleting fields.
- Updated script to drop fields where value was `(null)`.

* Document old_auid, new_auid, old_ses, new_ses

andrewkroh added a commit to andrewkroh/beats that referenced this pull request Apr 7, 2017

Add fileset for parsing linux auditd logs (elastic#3750) (elastic#3923)…
… (elastic#3941)

The PR adds an auditd module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data.

Features

- Parses audit event type, unix epoch time, audit event counter, and the arbitrary key/value pairs that follow.
- Applies geoip lookup to the `auditd.log.addr` field which is present for some remote login events.
- Decodes hex encoded ascii values that are sometimes used for the `auditd.log.exe` and `auditd.log.cmd` fields.
- Remove key/value pairs where the value is `?`.

Missing Features

- Decoder for `auditd.log.saddr` field present in SOCKADDR audit events. A recipe is described [here](https://unix.stackexchange.com/questions/102926/how-to-interpret-the-saddr-field-of-an-audit-log) and could probably be ported to painless. Sample value:
  `type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F`

andrewkroh added a commit to andrewkroh/beats that referenced this pull request Apr 10, 2017

Add fileset for parsing linux auditd logs (elastic#3750) (elastic#3923)…
… (elastic#3941) (elastic#3962)

The PR adds an auditd module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data.

Features

- Parses audit event type, unix epoch time, audit event counter, and the arbitrary key/value pairs that follow.
- Applies geoip lookup to the `auditd.log.addr` field which is present for some remote login events.
- Decodes hex encoded ascii values that are sometimes used for the `auditd.log.exe` and `auditd.log.cmd` fields.
- Remove key/value pairs where the value is `?`.

Missing Features

- Decoder for `auditd.log.saddr` field present in SOCKADDR audit events. A recipe is described [here](https://unix.stackexchange.com/questions/102926/how-to-interpret-the-saddr-field-of-an-audit-log) and could probably be ported to painless. Sample value:
  `type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F`

tsg added a commit that referenced this pull request Apr 11, 2017

Add fileset for parsing linux auditd logs (#3750) (#3923) (#3941) (#3962
) (#3975)

The PR adds an auditd module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data.

Features

- Parses audit event type, unix epoch time, audit event counter, and the arbitrary key/value pairs that follow.
- Applies geoip lookup to the `auditd.log.addr` field which is present for some remote login events.
- Decodes hex encoded ascii values that are sometimes used for the `auditd.log.exe` and `auditd.log.cmd` fields.
- Remove key/value pairs where the value is `?`.

Missing Features

- Decoder for `auditd.log.saddr` field present in SOCKADDR audit events. A recipe is described [here](https://unix.stackexchange.com/questions/102926/how-to-interpret-the-saddr-field-of-an-audit-log) and could probably be ported to painless. Sample value:
  `type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F`

@andrewkroh andrewkroh deleted the andrewkroh:feature/filebeat-audit-fileset branch Jul 5, 2017

athom added a commit to athom/beats that referenced this pull request Jan 25, 2018

Add fileset for parsing linux auditd logs (elastic#3750)
The PR adds and audit fileset to the system module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data.

Features

- Parses audit event type, unix epoch time, audit event counter, and the arbitrary key/value pairs that follow.
- Applies geoip lookup to the `system.audit.addr` field which is present for some remote login events.
- Decodes hex encoded ascii values that are sometimes used for the `system.audit.exe` and `system.audit.cmd` fields.
- Remove key/value pairs where the value is `?`.

Missing Features

- Decoder for `system.audit.saddr` field present in SOCKADDR audit events. A recipe is described [here](https://unix.stackexchange.com/questions/102926/how-to-interpret-the-saddr-field-of-an-audit-log) and could probably be ported to painless. Sample value:
  `type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F`

* Apply fileset overrides before checking enabled

Disable other filesets in a module while testing.

* Addressing review comments

- Renamed system.audit.type to system.audit.record_type
- Renamed system.audit.counter to system.audit.sequence
- Dropped system.audit.epoch after parsing it into @timestamp
- Dropped message field.
- Removed “and others” from the supported OSes listed in docs.
- Updated dashboards after renaming and deleting fields.
- Updated script to drop fields where value was `(null)`.

* Document old_auid, new_auid, old_ses, new_ses
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.