Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fileset for parsing linux auditd logs #3750

Merged
merged 4 commits into from Apr 3, 2017

Conversation

andrewkroh
Copy link
Member

@andrewkroh andrewkroh commented Mar 15, 2017

Update: Module renamed in #3941.

The PR adds and audit fileset to the system module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data.

Features

  • Parses audit event type, unix epoch time, audit event counter, and the arbitrary key/value pairs that follow.
  • Applies geoip lookup to the system.audit.addr field which is present for some remote login events.
  • Decodes hex encoded ascii values that are sometimes used for the system.audit.exe and system.audit.cmd fields.
  • Remove key/value pairs where the value is ?.

Missing Features

  • Decoder for system.audit.saddr field present in SOCKADDR audit events. A recipe is described here and could probably be ported to painless. Sample value:
    type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F
  • Syscall number to name conversion. This would require an architecture specific mapping table.
  • Join data from events with the same auditd counter value. For example there are separate log lines for SYSCALL, CWD, and PATH audit types that all correspond to the same overall action.

Filebeat Audit Dashboard

@andrewkroh andrewkroh added Filebeat Filebeat in progress Pull request is currently in progress. labels Mar 15, 2017
@andrewkroh andrewkroh force-pushed the feature/filebeat-audit-fileset branch 2 times, most recently from 3e91ac1 to 8794516 Compare March 16, 2017 21:19
@andrewkroh andrewkroh added review and removed in progress Pull request is currently in progress. labels Mar 16, 2017
@andrewkroh andrewkroh force-pushed the feature/filebeat-audit-fileset branch from 8794516 to a5ef7ae Compare March 16, 2017 21:25
@andrewkroh andrewkroh requested a review from tsg March 16, 2017 21:25
fcfg, err = applyOverrides(fcfg, mcfg.Module, filesetName, overrides)
if err != nil {
return nil, fmt.Errorf("Error applying overrides on fileset %s/%s: %v", mcfg.Module, filesetName, err)
}

if fcfg.Enabled != nil && (*fcfg.Enabled) == false {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason this was moved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies the overrides first, which allows an override to disable a fileset. It allows a CLI config like:

-M system.audit.enabled=false

{
"script": {
"lang": "painless",
"inline": " String trimQuotes(def v) {\n if (v.startsWith(\"'\") || v.startsWith('\"')) {\n v = v.substring(1, v.length());\n }\n if (v.endsWith(\"'\") || v.endsWith('\"')) {\n v = v.substring(0, v.length()-1);\n } \n return v;\n }\n \n boolean isHexAscii(String v) {\n def len = v.length();\n if (len == 0 || len % 2 != 0) {\n return false; \n }\n \n for (int i = 0 ; i < len ; i++) {\n if (Character.digit(v.charAt(i), 16) == -1) {\n return false;\n }\n }\n\n return true;\n }\n \n String convertHexToString(String hex) {\n\t StringBuilder sb = new StringBuilder();\n\n for (int i=0; i < hex.length() - 1; i+=2) {\n String output = hex.substring(i, (i + 2));\n int decimal = Integer.parseInt(output, 16);\n sb.append((char)decimal);\n }\n\n return sb.toString();\n }\n \n def possibleHexKeys = ['exe', 'cmd'];\n \n def audit = ctx.system.get(\"audit\");\n Iterator entries = audit.entrySet().iterator();\n while (entries.hasNext()) {\n def e = entries.next();\n def k = e.getKey();\n def v = e.getValue(); \n\n // Remove entries whose value is ?\n if (v == \"?\") {\n entries.remove();\n continue;\n }\n \n // Convert hex values to ASCII.\n if (possibleHexKeys.contains(k) && isHexAscii(v)) {\n v = convertHexToString(v);\n audit.put(k, v);\n }\n \n // Trim quotes.\n if (v instanceof String) {\n v = trimQuotes(v);\n audit.put(k, v);\n }\n \n // Convert arch.\n if (k == \"arch\" && v == \"c000003e\") {\n audit.put(k, \"x86_64\");\n }\n }\n \n // Fix inner msg parsing.\n def msg = audit.get(\"msg\");\n if (msg != null) {\n def i = msg.indexOf(\"=\");\n if (i >= 0) {\n def newKey = trimQuotes(msg.substring(0, i));\n def newValue = trimQuotes(msg.substring(i+1, msg.length()));\n audit.remove(\"msg\");\n audit.put(newKey, newValue);\n }\n }"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is quite a script. I'm curious if that has a major impact on ingestion speed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't measured in any way the impact on ingestion rates.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to remove a small part of the script by changing the grok pattern (just pushed it), but I think it is as simple as I can make it.

The PR adds and audit fileset to the system module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data.

Features

- Parses audit event type, unix epoch time, audit event counter, and the arbitrary key/value pairs that follow.
- Applies geoip lookup to the `system.audit.addr` field which is present for some remote login events.
- Decodes hex encoded ascii values that are sometimes used for the `system.audit.exe` and `system.audit.cmd` fields.
- Remove key/value pairs where the value is `?`.

Missing Features

- Decoder for `system.audit.saddr` field present in SOCKADDR audit events. A recipe is described [here](https://unix.stackexchange.com/questions/102926/how-to-interpret-the-saddr-field-of-an-audit-log) and could probably be ported to painless. Sample value:
  `type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F`
Disable other filesets in a module while testing.
@andrewkroh andrewkroh force-pushed the feature/filebeat-audit-fileset branch from a5ef7ae to c945fb7 Compare March 17, 2017 16:38
@ruflin
Copy link
Member

ruflin commented Mar 20, 2017

LGTM. Waiting for @tsg to have a look.

assert "error" not in obj

if module != "system" and fileset != "audit":
# There are dynamic fields in audit logs that are not documented.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the dynamic fields grouped under a sub-directory? Then we could document it as a dict type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the fields are written under system.audit.* with no inner dictionaries. Is there anything I should do in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible / make sense to group them under a sub-dict? I think we don't have a precedent in any Beat so far that dynamic and static fields are mixed together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible, but I don't think using a sub-dict in this case provides any additional context to the user about the fields. We would be using sub-dict here due to the implementation details (like the fact that it uses a kv processor instead of a specific grok pattern for each audit record type). From a user perspective I don't think a sub-dict would add value (at least not with what I'm thinking -- system.audit.kv.*), but I don't like to be the one breaking precedent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing I worry is that a dynamic field could overwrite a static field and conflict on the type. Could we perhaps just have one entry with all the fields as a string? Or use the raw namespace? Also not sure yet how to deal with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there was an exclude_keys option in the KV processor then it would easy to prevent the dynamic fields from overwriting the well-known fields from the header. The Logstash KV filter has this option so I'll ask about getting this in ES. I could enhance this module to use that once it's available.

There are only a few keys in the audit header that could be overwritten (record_type and sequence) by KV data and I haven't observed these being used in any audit events. I think the risk is minimal.

module, obj["fileset"]["module"])

if not (module == "mysql" and fileset == "slowlog") and not (module == "system" and fileset == "auth"):
# TODO: There are errors parsing the test logs from these modules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is not ideal, but I don't have an easy solution for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, definitely not idea, but I really wanted to enable error checking. Maybe as part of the manifest or some file in the test directory the fileset could self-describe its testing parameters (e.g. allow_errors: true, has_dynamic_fields: true).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we probably need something like that. I'm good with the current solution for the time being.

=== Compatibility

This module was tested with logs from OSes like Ubuntu 12.04, Centos 7, macOS Sierra, and others.
This module was tested with logs from OSes like Ubuntu 12.04, Centos 7, macOS
Sierra, and others.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "and others" might come back to us as an issue in support, maybe it's better to leave it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, probably a good idea to drop that. Will do.

"field": "system.audit.sub_kv",
"ignore_failure": true
}
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other filesets so far we remove the original message key, to avoid duplication. For consistency, we might want to do it here as well, unless there's a lot of value in keeping it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason I like having the message field is because it gets analyzed and make searching simpler when I don't know exactly what I'm looking for (helps with _all being deprecated). WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was always on the fence (for each new module) about keeping the message in, but thought I'd try it without it first, since it's easier to add than remove. My understanding is that by default Kibana will still search in all fields, even without having an explicit _all.

What you think if we leave it out here again for consistency, but we reconsider before 6.0 if we want to add it everywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to not have the raw message in by default. But it can become useful in some case, for example for debugging. We could introduce a config option which can be set per module to turn it on. Problem is that this would mean a change in the processor which should not change based on config options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the message field in ececd94.

},
{
"convert": {
"field" : "system.audit.item",
Copy link
Contributor

@tsg tsg Mar 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, are these conversions needed? In the other places I have assumed that if we set it as a number in the mapping template, that's enough for ES to do the conversion itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I don't think these are strictly necessary. ES should do the conversion.

The only difference that I can possibly think of here is how errors would be handled. If the conversion fails in the ingest processor the event is still written to ES, but has an error field and is not completely enriched. If a failure occurs during automatic conversion in ES then you get an mapping exception and the whole document gets dropped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I agree it's worth keeping them in this case.

"AUDIT_KEY_VALUES": "%{WORD}=%{GREEDYDATA}"
},
"patterns": [
"%{AUDIT_PREFIX} %{AUDIT_KEY_VALUES:system.audit.kv} old auid=%{NUMBER:system.audit.old_auid} new auid=%{NUMBER:system.audit.new_auid} old ses=%{NUMBER:system.audit.old_ses} new ses=%{NUMBER:system.audit.new_ses}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth documenting old_auid, old_ses, etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add those.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 7cd7d52

Copy link
Contributor

@tsg tsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, sorry for the late review. I left a few comments. This would also be worth a quick blog post like the one I did on the auth logs.

@monicasarbu monicasarbu added v6.0.0-alpha1 in progress Pull request is currently in progress. labels Mar 30, 2017
- Renamed system.audit.type to system.audit.record_type
- Renamed system.audit.counter to system.audit.sequence
- Dropped system.audit.epoch after parsing it into @timestamp
- Dropped message field.
- Removed “and others” from the supported OSes listed in docs.
- Updated dashboards after renaming and deleting fields.
- Updated script to drop fields where value was `(null)`.
@andrewkroh andrewkroh removed the in progress Pull request is currently in progress. label Mar 31, 2017
@tsg tsg merged commit e5f5bef into elastic:master Apr 3, 2017
@tsg tsg added needs_backport PR is waiting to be backported to other branches. v5.4.0 labels Apr 5, 2017
andrewkroh added a commit to andrewkroh/beats that referenced this pull request Apr 10, 2017
… (elastic#3941) (elastic#3962)

The PR adds an auditd module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data.

Features

- Parses audit event type, unix epoch time, audit event counter, and the arbitrary key/value pairs that follow.
- Applies geoip lookup to the `auditd.log.addr` field which is present for some remote login events.
- Decodes hex encoded ascii values that are sometimes used for the `auditd.log.exe` and `auditd.log.cmd` fields.
- Remove key/value pairs where the value is `?`.

Missing Features

- Decoder for `auditd.log.saddr` field present in SOCKADDR audit events. A recipe is described [here](https://unix.stackexchange.com/questions/102926/how-to-interpret-the-saddr-field-of-an-audit-log) and could probably be ported to painless. Sample value:
  `type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F`
@andrewkroh andrewkroh removed the needs_backport PR is waiting to be backported to other branches. label Apr 10, 2017
tsg pushed a commit that referenced this pull request Apr 11, 2017
) (#3975)

The PR adds an auditd module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data.

Features

- Parses audit event type, unix epoch time, audit event counter, and the arbitrary key/value pairs that follow.
- Applies geoip lookup to the `auditd.log.addr` field which is present for some remote login events.
- Decodes hex encoded ascii values that are sometimes used for the `auditd.log.exe` and `auditd.log.cmd` fields.
- Remove key/value pairs where the value is `?`.

Missing Features

- Decoder for `auditd.log.saddr` field present in SOCKADDR audit events. A recipe is described [here](https://unix.stackexchange.com/questions/102926/how-to-interpret-the-saddr-field-of-an-audit-log) and could probably be ported to painless. Sample value:
  `type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F`
@andrewkroh andrewkroh deleted the feature/filebeat-audit-fileset branch July 5, 2017 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants