From a44d59c13ee2ed31dbd52fe0d79c3431a6e5bae5 Mon Sep 17 00:00:00 2001 From: "Eric D. Schabell" Date: Mon, 24 Nov 2025 09:43:45 +0100 Subject: [PATCH 1/2] Updated blob input plugin documentation with examples. Fixes #2185. Signed-off-by: Eric D. Schabell --- pipeline/inputs/blob.md | 227 +++++++++++++++++++++++++++++++++++----- 1 file changed, 203 insertions(+), 24 deletions(-) diff --git a/pipeline/inputs/blob.md b/pipeline/inputs/blob.md index 2410e856e..795bf6f88 100644 --- a/pipeline/inputs/blob.md +++ b/pipeline/inputs/blob.md @@ -1,6 +1,6 @@ # Blob -The _Blob_ input plugin accepts blob (binary) files. +The _Blob_ input plugin monitors a directory and processes binary (blob) files. It scans the specified path at regular intervals, reads binary files, and forwards them as records through the Fluent Bit pipeline. This plugin is useful for processing binary log files, artifacts, or any binary data that needs to be collected and forwarded to outputs. ## Configuration parameters @@ -8,28 +8,62 @@ The plugin supports the following configuration parameters: | Key | Description | Default | |:------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------| -| `alias` | Sets an alias for multiple instances of the same output plugin. | _none_ | -| `database_file` | Database file. | _none_ | -| `exclude_pattern` | Pattern to exclude. | _none_ | -| `log_level` | Specifies the log level for output plugin. If not set here, plugin uses global log level in `service` section. | `info` | -| `log_supress_interval` | Suppresses log messages from output plugin that appear similar within a specified time interval. `0` no suppression. | `0` | -| `mem_buf_limit` | Set a memory buffer limit for the input plugin instance in bytes. If the limit is reached, the plugin will pause until the buffer is drained. If set to 0, the buffer limit is disabled. If the plugin has enabled filesystem buffering, this limit won't apply. | `0` | -| `path` | Path to scan for blob (binary) files. | _none_ | -| `routable` | If `true`, the data generated by the plugin can be forwarded to other plugins or outputs. If `false`, the data will be discarded. | `true` | -| `scan_refresh_interval` | Set the interval time to scan for new files. | `2s` | -| `storage.pause_on_chunks_overlimit` | Enable pausing on an input when they reach their chunks limit. | `false` | -| `storage.type` | Sets the storage type for this input, one of `filesystem`, `memory` or `memrb`. | `memory` | -| `tag` | Set a tag for the events generated by this input plugin. | _none_ | -| `threaded` | Indicates whether to run this input in its own [thread](../../administration/multithreading.md#inputs). | `false` | -| | | | -| `threaded.ring_buffer.capacity` | Set custom ring buffer capacity when the input runs in threaded mode. | `1024` | -| `threaded.ring_buffer.window` | Set custom ring buffer window percentage for threaded inputs. | `5` | -| `upload_success_action` | Field is string for action on success. | _none_ | -| `upload_success_suffix` | Field is string for suffix on success. | _none_ | -| `upload_success_message` | Field is string for message on success. | _none_ | -| `upload_failure_action` | Field is string for action on failure. | _none_ | -| `upload_failure_suffix` | Field is string for suffix on failure. | _none_ | -| `upload_failure_message` | Field is string for message on failure. | _none_ | +| `alias` | Sets an alias for multiple instances of the same input plugin. Useful when you need to run multiple blob input instances with different configurations. | _none_ | +| `database_file` | Specify a database file to keep track of processed files and their state. This enables the plugin to resume processing from the last known position if Fluent Bit is restarted. The database is backed by SQLite3. Recommended to be unique per plugin instance. For more details, see [Database file](#database-file). | _none_ | +| `exclude_pattern` | Set one or multiple shell patterns separated by commas to exclude files matching certain criteria. For example, `exclude_pattern *.tmp,*.bak` will exclude temporary and backup files from processing. | _none_ | +| `log_level` | Specifies the log level for this input plugin. If not set here, the plugin uses the global log level specified in the `service` section. Valid values: `off`, `error`, `warn`, `info`, `debug`, `trace`. | `info` | +| `log_suppress_interval` | Suppresses log messages from this input plugin that appear similar within a specified time interval. Set to `0` to disable suppression. The value must be specified in seconds. This helps reduce log noise when the same error or warning occurs repeatedly. | `0` | +| `mem_buf_limit` | Set a memory buffer limit for the input plugin instance in bytes. If the limit is reached, the plugin will pause until the buffer is drained. If set to `0`, the buffer limit is disabled. If the plugin has enabled filesystem buffering, this limit won't apply. The value must be according to the [Unit Size](../../administration/configuring-fluent-bit/unit-sizes.md) specification. | `0` | +| `path` | Path to scan for blob (binary) files. Supports wildcards and glob patterns. For example, `/var/log/binaries/*.bin` or `/data/artifacts/**/*.dat`. This is a required parameter. | _none_ | +| `routable` | If `true`, the data generated by the plugin can be forwarded to other plugins or outputs. If `false`, the data will be discarded. Useful for testing or when you want to process data but not forward it. | `true` | +| `scan_refresh_interval` | Set the interval time to scan for new files. The plugin periodically scans the specified path for new or modified files. The value must be specified according to the [Unit Size](../../administration/configuring-fluent-bit/unit-sizes.md) specification (e.g., `2s`, `30m`, `1h`). | `2s` | +| `storage.pause_on_chunks_overlimit` | Enable pausing on an input when it reaches its chunks limit. When enabled, the plugin will pause processing if the number of chunks exceeds the limit, preventing memory issues during backpressure scenarios. | `false` | +| `storage.type` | Sets the storage type for this input. Options: `filesystem` (persists data to disk), `memory` (stores data in memory only), or `memrb` (memory ring buffer). For production environments with high data volumes, consider using `filesystem` to prevent data loss during restarts. | `memory` | +| `tag` | Set a tag for the events generated by this input plugin. Tags are used for routing records to specific outputs. Supports tag expansion with wildcards. | _none_ | +| `threaded` | Indicates whether to run this input in its own [thread](../../administration/multithreading.md#inputs). When enabled, the plugin runs in a separate thread, which can improve performance for I/O-bound operations. | `false` | +| `threaded.ring_buffer.capacity` | Set custom ring buffer capacity when the input runs in threaded mode. This determines how many records can be buffered in the ring buffer before blocking. | `1024` | +| `threaded.ring_buffer.window` | Set custom ring buffer window percentage for threaded inputs. This controls when the ring buffer is considered "full" and triggers backpressure handling. | `5` | +| `upload_success_action` | Action to perform on the file after successful processing. Supported values: `delete` (delete the file), `move` (move the file), `none` (no action). When set to `move`, use `upload_success_suffix` to specify the destination. | _none_ | +| `upload_success_suffix` | Suffix to append to the filename when moving a file after successful processing. Only used when `upload_success_action` is set to `move`. For example, if set to `.processed`, a file named `data.bin` will be moved to `data.bin.processed`. | _none_ | +| `upload_success_message` | Custom message to include in the log when a file is successfully processed. This can be used for debugging or monitoring purposes. | _none_ | +| `upload_failure_action` | Action to perform on the file after processing failure. Supported values: `delete` (delete the file), `move` (move the file), `none` (no action). When set to `move`, use `upload_failure_suffix` to specify the destination. | _none_ | +| `upload_failure_suffix` | Suffix to append to the filename when moving a file after processing failure. Only used when `upload_failure_action` is set to `move`. For example, if set to `.failed`, a file named `data.bin` will be moved to `data.bin.failed` if processing fails. | _none_ | +| `upload_failure_message` | Custom message to include in the log when file processing fails. This can be used for debugging or monitoring purposes. | _none_ | + +## How it works + +The Blob input plugin periodically scans the specified directory path for binary files. When a new or modified file is detected, the plugin reads the file content and creates records that are forwarded through the Fluent Bit pipeline. The plugin can track processed files using a database file, allowing it to resume from the last known position after a restart. + +Binary file content is typically included in the output records, and the exact format depends on the output plugin configuration. The plugin generates one or more records per file, depending on the file size and configuration. + +## Database file + +The database file enables the plugin to track which files have been processed and maintain state across Fluent Bit restarts. This is similar to how the [Tail input plugin](../inputs/tail.md#database-file) uses a database file. + +When a database file is specified: + +- The plugin stores information about processed files, including file paths and processing status +- On restart, the plugin can skip files that were already processed +- The database is backed by SQLite3 and will create additional files (`.db-shm` and `.db-wal`) when using write-ahead logging mode + +It's recommended to use a unique database file for each blob input instance to avoid conflicts. For example: + +```yaml +pipeline: + inputs: + - name: blob + path: /var/log/binaries/*.bin + database_file: /var/lib/fluent-bit/blob.db +``` + +## Use cases + +The Blob input plugin is useful for: + +- **Binary log files**: Processing binary-formatted log files that can't be read as text +- **Artifact collection**: Collecting binary artifacts or build outputs for analysis or archival +- **File monitoring**: Monitoring directories for new binary files and forwarding them to storage or analysis systems +- **Data pipeline integration**: Integrating binary data sources into your Fluent Bit data pipeline ## Get started @@ -63,12 +97,157 @@ In your main configuration file append the following: pipeline: inputs: - name: blob - path: '[PATH_TO_BINARY_FILES]' + path: '/path/to/binary/files/*.bin' outputs: - name: stdout match: '*' ``` +{% endtab %} +{% tab title="fluent-bit.conf" %} + +```text +[INPUT] + Name blob + Path /path/to/binary/files/*.bin + +[OUTPUT] + Name stdout + Match * +``` + +{% endtab %} +{% endtabs %} + +## Examples + +### Basic configuration with database tracking + +This example shows how to configure the blob plugin with a database file to track processed files: + +{% tabs %} +{% tab title="fluent-bit.yaml" %} + +```yaml +pipeline: + inputs: + - name: blob + path: /var/log/binaries/*.bin + database_file: /var/lib/fluent-bit/blob.db + scan_refresh_interval: 10s + tag: blob.files + + outputs: + - name: stdout + match: '*' +``` + +{% endtab %} +{% tab title="fluent-bit.conf" %} + +```text +[INPUT] + Name blob + Path /var/log/binaries/*.bin + Database_File /var/lib/fluent-bit/blob.db + Scan_Refresh_Interval 10s + Tag blob.files + +[OUTPUT] + Name stdout + Match * +``` + +{% endtab %} +{% endtabs %} + +### Configuration with file exclusion and storage + +This example excludes certain file patterns and uses filesystem storage for better reliability: + +{% tabs %} +{% tab title="fluent-bit.yaml" %} + +```yaml +pipeline: + inputs: + - name: blob + path: /data/artifacts/**/* + exclude_pattern: *.tmp,*.bak,*.old + storage.type: filesystem + storage.pause_on_chunks_overlimit: true + mem_buf_limit: 50M + tag: artifacts + + outputs: + - name: stdout + match: '*' +``` + +{% endtab %} +{% tab title="fluent-bit.conf" %} + +```text +[INPUT] + Name blob + Path /data/artifacts/**/* + Exclude_Pattern *.tmp,*.bak,*.old + Storage.Type filesystem + Storage.Pause_On_Chunks_Overlimit true + Mem_Buf_Limit 50M + Tag artifacts + +[OUTPUT] + Name stdout + Match * +``` + +{% endtab %} +{% endtabs %} + +### Configuration with file actions after processing + +This example moves files after successful processing and handles failures: + +{% tabs %} +{% tab title="fluent-bit.yaml" %} + +```yaml +pipeline: + inputs: + - name: blob + path: /var/log/binaries/*.bin + database_file: /var/lib/fluent-bit/blob.db + upload_success_action: move + upload_success_suffix: .processed + upload_failure_action: move + upload_failure_suffix: .failed + tag: blob.data + + outputs: + - name: stdout + match: '*' +``` + +{% endtab %} +{% tab title="fluent-bit.conf" %} + +```text +[INPUT] + Name blob + Path /var/log/binaries/*.bin + Database_File /var/lib/fluent-bit/blob.db + Upload_Success_Action move + Upload_Success_Suffix .processed + Upload_Failure_Action move + Upload_Failure_Suffix .failed + Tag blob.data + +[OUTPUT] + Name stdout + Match * +``` + {% endtab %} {% endtabs %} From 7648716800e90a2ddfc2aa23cf75d3916eb51cd4 Mon Sep 17 00:00:00 2001 From: "Eric D. Schabell" Date: Mon, 24 Nov 2025 09:47:30 +0100 Subject: [PATCH 2/2] Fixes to linting errors, applies to #2186. Signed-off-by: Eric D. Schabell --- pipeline/inputs/blob.md | 70 ++++++++++++++++++++--------------------- 1 file changed, 35 insertions(+), 35 deletions(-) diff --git a/pipeline/inputs/blob.md b/pipeline/inputs/blob.md index 795bf6f88..840debb41 100644 --- a/pipeline/inputs/blob.md +++ b/pipeline/inputs/blob.md @@ -9,14 +9,14 @@ The plugin supports the following configuration parameters: | Key | Description | Default | |:------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------| | `alias` | Sets an alias for multiple instances of the same input plugin. Useful when you need to run multiple blob input instances with different configurations. | _none_ | -| `database_file` | Specify a database file to keep track of processed files and their state. This enables the plugin to resume processing from the last known position if Fluent Bit is restarted. The database is backed by SQLite3. Recommended to be unique per plugin instance. For more details, see [Database file](#database-file). | _none_ | +| `database_file` | Specify a database file to keep track of processed files and their state. This enables the plugin to resume processing from the last known position if Fluent Bit is restarted. | _none_ | | `exclude_pattern` | Set one or multiple shell patterns separated by commas to exclude files matching certain criteria. For example, `exclude_pattern *.tmp,*.bak` will exclude temporary and backup files from processing. | _none_ | | `log_level` | Specifies the log level for this input plugin. If not set here, the plugin uses the global log level specified in the `service` section. Valid values: `off`, `error`, `warn`, `info`, `debug`, `trace`. | `info` | | `log_suppress_interval` | Suppresses log messages from this input plugin that appear similar within a specified time interval. Set to `0` to disable suppression. The value must be specified in seconds. This helps reduce log noise when the same error or warning occurs repeatedly. | `0` | | `mem_buf_limit` | Set a memory buffer limit for the input plugin instance in bytes. If the limit is reached, the plugin will pause until the buffer is drained. If set to `0`, the buffer limit is disabled. If the plugin has enabled filesystem buffering, this limit won't apply. The value must be according to the [Unit Size](../../administration/configuring-fluent-bit/unit-sizes.md) specification. | `0` | | `path` | Path to scan for blob (binary) files. Supports wildcards and glob patterns. For example, `/var/log/binaries/*.bin` or `/data/artifacts/**/*.dat`. This is a required parameter. | _none_ | -| `routable` | If `true`, the data generated by the plugin can be forwarded to other plugins or outputs. If `false`, the data will be discarded. Useful for testing or when you want to process data but not forward it. | `true` | -| `scan_refresh_interval` | Set the interval time to scan for new files. The plugin periodically scans the specified path for new or modified files. The value must be specified according to the [Unit Size](../../administration/configuring-fluent-bit/unit-sizes.md) specification (e.g., `2s`, `30m`, `1h`). | `2s` | +| `routable` | If `true`, the data generated by the plugin can be forwarded to other plugins or outputs. If `false`, the data will be discarded. Use this for testing or when you want to process data but not forward it. | `true` | +| `scan_refresh_interval` | Set the interval time to scan for new files. The plugin periodically scans the specified path for new or modified files. The value must be specified according to the [Unit Size](../../administration/configuring-fluent-bit/unit-sizes.md) specification (`2s`, `30m`, `1h`). | `2s` | | `storage.pause_on_chunks_overlimit` | Enable pausing on an input when it reaches its chunks limit. When enabled, the plugin will pause processing if the number of chunks exceeds the limit, preventing memory issues during backpressure scenarios. | `false` | | `storage.type` | Sets the storage type for this input. Options: `filesystem` (persists data to disk), `memory` (stores data in memory only), or `memrb` (memory ring buffer). For production environments with high data volumes, consider using `filesystem` to prevent data loss during restarts. | `memory` | | `tag` | Set a tag for the events generated by this input plugin. Tags are used for routing records to specific outputs. Supports tag expansion with wildcards. | _none_ | @@ -58,7 +58,7 @@ pipeline: ## Use cases -The Blob input plugin is useful for: +The Blob input plugin common use cases are: - **Binary log files**: Processing binary-formatted log files that can't be read as text - **Artifact collection**: Collecting binary artifacts or build outputs for analysis or archival @@ -67,7 +67,7 @@ The Blob input plugin is useful for: ## Get started -You can run the plugin from the command line or through the configuration file: +You can run the plugin from the command line or through a configuration file. ### Command line @@ -109,12 +109,12 @@ pipeline: ```text [INPUT] - Name blob - Path /path/to/binary/files/*.bin + Name blob + Path /path/to/binary/files/*.bin [OUTPUT] - Name stdout - Match * + Name stdout + Match * ``` {% endtab %} @@ -148,15 +148,15 @@ pipeline: ```text [INPUT] - Name blob - Path /var/log/binaries/*.bin - Database_File /var/lib/fluent-bit/blob.db - Scan_Refresh_Interval 10s - Tag blob.files + Name blob + Path /var/log/binaries/*.bin + Database_File /var/lib/fluent-bit/blob.db + Scan_Refresh_Interval 10s + Tag blob.files [OUTPUT] - Name stdout - Match * + Name stdout + Match * ``` {% endtab %} @@ -190,17 +190,17 @@ pipeline: ```text [INPUT] - Name blob - Path /data/artifacts/**/* - Exclude_Pattern *.tmp,*.bak,*.old - Storage.Type filesystem - Storage.Pause_On_Chunks_Overlimit true - Mem_Buf_Limit 50M - Tag artifacts + Name blob + Path /data/artifacts/**/* + Exclude_Pattern *.tmp,*.bak,*.old + Storage.Type filesystem + Storage.Pause_On_Chunks_Overlimit true + Mem_Buf_Limit 50M + Tag artifacts [OUTPUT] - Name stdout - Match * + Name stdout + Match * ``` {% endtab %} @@ -235,18 +235,18 @@ pipeline: ```text [INPUT] - Name blob - Path /var/log/binaries/*.bin - Database_File /var/lib/fluent-bit/blob.db - Upload_Success_Action move - Upload_Success_Suffix .processed - Upload_Failure_Action move - Upload_Failure_Suffix .failed - Tag blob.data + Name blob + Path /var/log/binaries/*.bin + Database_File /var/lib/fluent-bit/blob.db + Upload_Success_Action move + Upload_Success_Suffix .processed + Upload_Failure_Action move + Upload_Failure_Suffix .failed + Tag blob.data [OUTPUT] - Name stdout - Match * + Name stdout + Match * ``` {% endtab %}