Skip to content

Commit c8ed628

Browse files
committed
Added DLQ as first item in troublshooting docs. Applies to #2207.
Signed-off-by: Eric D. Schabell <eric@schabell.org>
1 parent 43fa2e8 commit c8ed628

File tree

1 file changed

+55
-0
lines changed

1 file changed

+55
-0
lines changed

administration/troubleshooting.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,64 @@
22

33
<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=759ddb3d-b363-4ee6-91fa-21025259767a" />
44

5+
- [Dead letter queue: preserve failed chunks](#dead-letter-queue)
56
- [Tap: generate events or records](#tap)
67
- [Dump internals signal](#dump-internals-and-signal)
78

9+
## Dead Letter Queue
10+
11+
The Dead Letter Queue (DLQ) feature preserves chunks that fail to be delivered to output destinations. This is useful for troubleshooting delivery failures without losing data.
12+
13+
### Enable DLQ
14+
15+
To enable the DLQ, add the following to your Service section:
16+
17+
{% tabs %}
18+
{% tab title="fluent-bit.yaml" %}
19+
20+
```yaml
21+
service:
22+
storage.path: /var/log/flb-storage/
23+
storage.keep.rejected: on
24+
storage.rejected.path: rejected
25+
```
26+
27+
{% endtab %}
28+
{% tab title="fluent-bit.conf" %}
29+
30+
```text
31+
[SERVICE]
32+
storage.path /var/log/flb-storage/
33+
storage.keep.rejected on
34+
storage.rejected.path rejected
35+
```
36+
37+
{% endtab %}
38+
{% endtabs %}
39+
40+
### What gets stored
41+
42+
Chunks are copied to the DLQ when:
43+
44+
- An output plugin returns an unrecoverable error.
45+
- A chunk exhausts all configured retry attempts.
46+
- Retries are disabled (`retry_limit: no_retries`) and the flush fails.
47+
- The scheduler fails to schedule a retry.
48+
49+
### Examine DLQ files
50+
51+
DLQ files are stored in the configured path (for example, `/var/log/flb-storage/rejected/`) with names that include the tag, status code, and output plugin name. This helps identify which records failed and why.
52+
53+
For example, a file named `kube_var_log_containers_test_400_http_0x7f8b4c.flb` indicates a chunk with tag `kube.var.log.containers.test` that failed with status code `400` when sending to the `http` output.
54+
55+
### DLQ management
56+
57+
{% hint style="warning" %}
58+
DLQ files remain on disk until manually removed. Monitor disk usage and implement a cleanup policy.
59+
{% endhint %}
60+
61+
For more details on DLQ configuration, see [Buffering and Storage](buffering-and-storage.md#dead-letter-queue-dlq).
62+
863
## Tap
964

1065
Tap can be used to generate events or records detailing what messages pass through Fluent Bit, at what time and what filters affect them.

0 commit comments

Comments
 (0)