Skip to content

Commit

Permalink
[Auditbeat] Recover from errors in audit monitoring routine (elastic#…
Browse files Browse the repository at this point in the history
…22673)

The auditd module spawns a monitoring goroutine that fetches auditd status
every 15s. Due to this routine using a single audit client, if an update
fails (because a netlink message is late or other causes), the audit client
can get out of sync with the stream, failing in all subsequent requests.

For reasons that aren't 100% clear to me at the moment, this error condition
leads to a lot of `[audit_send_repl]` (2.6.x) / `[audit_send_reply]` (3.x+)
kernel threads being created.

```
ERROR [auditd] auditd/audit_linux.go:183 get status request failed:failed to get audit status ack: unexpected sequence number for reply (expected 6286 but got 6285)
```

```
$ ps -ef
[...]
root     27790     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27791     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27792     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27793     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27794     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27795     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27796     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27797     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27798     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27799     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27800     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27801     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27802     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27803     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27804     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27805     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27806     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27807     2  0 12:52 ?        00:00:00 [audit_send_repl]
root     27808     2  0 12:52 ?        00:00:00 [audit_send_repl]
[...]
```

This patch updates the error-handling logic to create a new audit client
when a status update fails, allowing to recover and preventing the
proliferation of `audit_send_repl` kernel threads.
  • Loading branch information
adriansr committed Nov 24, 2020
1 parent 5bd19e0 commit ca9550f
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
- system/socket: Fix kprobe grouping to allow running more than one instance. {pull}20325[20325]
- system/socket: Fixed a crash due to concurrent map read and write. {issue}21192[21192] {pull}21690[21690]
- file_integrity: stop monitoring excluded paths {issue}21278[21278] {pull}21282[21282]
- auditd: Fix an error condition causing a lot of `audit_send_reply` kernel threads being created. {pull}22673[22673]

*Filebeat*

Expand Down
15 changes: 14 additions & 1 deletion auditbeat/module/auditd/audit_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,11 @@ func (ms *MetricSet) Run(reporter mb.PushReporterV2) {
ms.log.Errorw("Failure creating audit monitoring client", "error", err)
}
go func() {
defer client.Close()
defer func() { // Close the most recently allocated "client" instance.
if client != nil {
client.Close()
}
}()
timer := time.NewTicker(lostEventsUpdateInterval)
defer timer.Stop()
for {
Expand All @@ -175,6 +179,15 @@ func (ms *MetricSet) Run(reporter mb.PushReporterV2) {
ms.updateKernelLostMetric(status.Lost)
} else {
ms.log.Error("get status request failed:", err)
if err = client.Close(); err != nil {
ms.log.Errorw("Error closing audit monitoring client", "error", err)
}
client, err = libaudit.NewAuditClient(nil)
if err != nil {
ms.log.Errorw("Failure creating audit monitoring client", "error", err)
reporter.Error(err)
return
}
}
}
}
Expand Down

0 comments on commit ca9550f

Please sign in to comment.